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POLYKETEDE SYNTHASE ENZYMES AND RECOMBINANT DNA 
CONSTRUCTS THEREFOR 

5 Field of the Invention 

The present invention relates to polyketides and the polyketide synthase (PKS) 
enzymes that produce them. The invention also relates generally to geneis encoding PKS 
enzymes and to recombinant host cells containing such genes and in which expression of 
such genes leads to the production of polyketides. The present invention also relates to 
10 compounds useful as medicaments having immunosuppressive and/or neurotrophic 
activity. Thus, the invention relates to the fields of chemistry, molecular biology, and 
agricultural, medical, and veterinary technology. 

Background of the Invention 

15 Polyketides are a class of compounds synthesized from 2-carbon units through a 

series of condensations and subsequent modifications. Polyketides occur in many types 
of organisms, including fungi and mycelial bacteria, in particular, the actinomycetes. 
Polyketides are biologically active molecules with a wide variety of structures, and the 
class encompasses numerous compounds with diverse activities. Tetracycline, 

20 erythromycin, epothilone, FK-506, FK-520, narbomycin, picromycin, rapamycin, 
spinocyn, and tylosin are examples of polyketides. Given the difficulty in producing 
polyketide compounds by traditional chemical methodology, and the typically low 
production of polyketides in wild-type cells, there has been considerable interest in 
finding improved or alternate means to produce polyketide compounds. 

25 This interest has resulted in the cloning, analysis, and manipulation by 

recombinant DNA technology of genes that encode PKS enzymes. The resulting 
technology allows one to manipulate a known PKS gene cluster either to produce the 
polyketide synthesized by that PKS at higher levels than occur in nature or in hosts that 
otherwise do not produce the polyketide. The technology also allows one to produce 

30 molecules that are structurally related to, but distinct from, the polyketides produced 
from known PKS gene clusters. See, e.g., PCT publication Nos. WO 93/13663; 
95/08548; 96/40968; 97/02358; 98/27203; and 98/49315; United States Patent Nos. 
4,874,748; 5,063,155; 5,098,837; 5,149,639; 5,672,491; 5,712,146; 5,830,750; and 
5,843,718; and Fuet aL y 1994, Biochemistry 33: 9321-9326; McDaniel etal, 1993, ....... 

35 Science 262: 1546-1550; and Rohr, 1995, Angew. Chem. Int. Ed. Engl 34(S): 881-888, 
each of which is incorporated herein by reference. 
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Polyketides are synthesized in nature by PKS enzymes. These enzymes, which 
are complexes of multiple large proteins, are similar to the synthases that catalyze 
condensation of 2-carbon units in the biosynthesis of fatty acids. PKSs catalyze the 
biosynthesis of polyketides through repeated, decarboxylative Claisen condensations 
5 between acylthioester building blocks. The building blocks used to form complex 
polyketides are typically acylthioesters, such as acetyl, butyryl, propionyl, malonyl, 
hydroxymalonyl, methylmalonyl, and ethylmalonyl CoA. Other building blocks include 
amino acid like acylthioesters. PKS enzymes that incorporate such building blocks 
include an activity that functions as an amino acid ligase (an AMP Iigase) or as a non- 
10 ribosomal peptide synthetase (NRPS). Two major types of PKS enzymes are known; 
these differ in their composition and mode of synthesis of the polyketide synthesized. 
These two major types of PKS enzymes are commonly referred to as Type I or 
"modular" and Type II "iterative" PKS enzymes. 

In the Type I or modular PKS enzyme group, a set of separate catalytic active 
15 sites (each active site is termed a "domain", and a set thereof is termed a "module") exists 
for each cycle of carbon chain elongation and modification in the polyketide synthesis 
pathway. The typical modular PKS is composed of several large polypeptides, which can 
be segregated from amino to carboxy termini into a loading module, multiple extender 
modules, and a releasing (or thioesterase) domain. The PKS enzyme known as 6- 
20 deoxyerythronolide B synthase (DEBS) is a Type I PKS. In DEBS, there is a loading 

module, six extender modules, and a thioesterase (TE) domain. The loading module, six 
extender modules, and TE of DEBS are present on three separate proteins (designated 
DEBS-1, DEBS-2, and DEBS-3, with two extender modules per protein). Each of the 
DEBS polypeptides is encoded by a separate open reading frame (ORF) or gene; these 
25 genes are known as eryAI, eryAII, and eryAIII. See Caffrey et al. 9 1992, FEBS Letters 
304: 205, and U.S. Patent No. 5,824,5 13, each of which is incorporated herein by 
reference. 

Generally, the loading module is responsible for binding the first building block 
used to synthesize the polyketide and transferring it to the first extender module. The 

30 loading module of DEBS consists of an acyltransferase (AT) domain and an acyl ca*rier 
protein (ACP) domain. Another type of loading module utilizes an inactivated ; 
ketosynthase (KS) domain and AT and ACP domains. This inactivated KS is in some 
instances called KS Q , where the superscript letter is the abbreviation for the amino acid, 
glutamine, that is present instead of the active site cysteine required for ketosynthase 

35 activity. In other PKS enzymes, including the FK-506 PKS, the loading module 

2 

NSOOC1D: < WO 002060 1 A2_l_> 



WO 00/20601 PCT/US99/22886 

incorporates an unusual starter unit and is composed of a CoA ligase like activity 
domain. In any event, the loading module recognizes a particular acyl-CoA (usually 
acetyl or propionyl but sometimes butyryl or other acyl-CoA) and transfers it as a thiol 
ester to the ACP of the loading module. 
5 The AT on each of the extender modules recognizes a particular extender-CoA 

(malonyl or alpha-substituted malonyl, i.e., methylmalonyl, ethylmalonyl, and 2- 
hydroxymalonyl) and transfers it to the ACP of that extender module to form a thioester. 
Each extender module is responsible for accepting a compound from a prior module, 
binding a building block, attaching the building block to the compound from the prior 

10 module, optionally performing one or more additional functions, and transferring th^ 
resulting compound to the next module. 

Each extender module of a modular PKS contains a KS, AT, ACP, and zero, one, 
two, or three domains that modify the beta-carbon of the growing polyketide chain. A 
typical (non-loading) minimal Type I PKS extender module is exemplified by extender 

15 module three of DEBS, which contains a KS domain, an AT domain, and an ACP 
domain. These three domains are sufficient to activate a 2-carbon extender unit and 
attach it to the growing polyketide molecule. The next extender module, in turn, is 
responsible for attaching the next building block and transferring the growing compound 
to the next extender module until synthesis is complete. 

20 Once the PKS is primed with acyl- and malonyl-ACPs, the acyl group of the 

loading module is transferred to form a thiol ester (trans-esterification) at the KS of the 
first extender module; at this stage, extender module one possesses an acyl-KS and a 
< malonyl (or substituted malonyl) ACP. The acyl group derived from the loading module 
is then covalently attached to the alpha-carbon of the malonyl group to form a carbon- 

25 carbon bond, driven by concomitant decarboxylation, and generating a new acyl-ACP 
that has a backbone two carbons longer than the loading building block (elongation or 
extension). 

The polyketide chain, growing by two carbons each extender module, is 
sequentially passed as covalently bound thiol esters from extender module to extender 
30 module, in an assembly line-like process. The carbon chain produced by this process 

alone would possess a ketone at every other carbon atom, producing a polyketone, flom 
which the name polyketide arises. Most commonly, however, additional enzymatic 
activities modify the beta keto group of each two carbon unit just after it has been added 
to the growing polyketide chain but before it is transferred to the next module. 
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Thus, in addition to the minimal module containing KS, AT, and ACP domains 
necessary to form the carbon-carbon bond, and as noted above, other domains that 
modify the beta-carbonyl moiety can be present. Thus, modules may contain a 
ketoreductase (KR) domain that reduces the keto group to an alcohol. Modules may also 

5 contain a KR domain plus a dehydratase (DH) domain that dehydrates the alcohol to a 
double bond. Modules may also contain a KR domain, a DH domain, and an 
enoylreductase (ER) domain that converts the double bond product to a saturated single 
bond using the beta carbon as a methylene function. An extender module can also 
contain other enzymatic activities, such as, for example, a methylase or dimethylase 

10 activity. 

After traversing the final extender module, the polyketide encounters a releasing 
domain that cleaves the polyketide from the PKS and typically cyclizes the polyketide. 
For example, final synthesis of 6-dEB is regulated by a TE domain located at the end of 
extender module six. In the synthesis of 6-dEB, the TE domain catalyzes cyclization of 
1 5 the macrolide ring by formation of an ester linkage. In FK-506, FK-520, rapamycin, and 
similar polyketides, the TE activity is replaced by a RapP (for rapamycin) or RapP like 
activity that makes a linkage incorporating a pipecolate acid residue. The enzymatic 
activity that catalyzes this incorporation for the rapamycin enzyme is known as RapP, 
encoded by the rapP gene. The polyketide can be modified further by tailoring enzymes; 
20 these enzymes add carbohydrate groups or methyl groups, or make other modifications, 
i.e., oxidation or reduction, on the polyketide core molecule. For example, 6-dEB is 
hydroxylated at C-6 and C-12 and glycosylated at C-3 and C-5 in the synthesis of 
erythromycin A. 

In Type I PKS polypeptides, the order of catalytic domains is conserved. When 
25 all beta-keto processing domains are present in a module, the order of domains in that 

module from N-to-C-terminus is always KS, AT, DH, ER, KR, and ACP. Some or all of 
the beta-keto processing domains may be missing in particular modules, but the order of 
the domains present in a module remains the same. The order of domains within modules 
is believed to be important for proper folding of the PKS polypetides into an active 
30 complex. Importantly, there is considerable flexibility in PKS enzymes, which allows for 
the genetic engineering of novel catalytic complexes. The engineering of these enzymes 
is achieved by modifying, adding, or deleting domains, or replacing them with those 
taken from other Type I PKS enzymes. It is also achieved by deleting, replacing, or 
adding entire modules with those taken from other sources. A genetically engineered 
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PKS complex should of course have the ability to catalyze the synthesis of the product 
predicted from the genetic alterations made. 

Alignments of the many available amino acid sequences for Type I PKS enzymes 
has approximately defined the boundaries of the various catalytic domains. Sequence 
5 alignments also have revealed linker regions between the catalytic domains and at the N- 
and C-termini of individual polypeptides. The sequences of these linker regions are less 
well conserved than are those for the catalytic domains, which is in part how linker 
regions are identified. Linker regions can be important for proper association between 
domains and between the individual polypeptides that comprise the PKS complex. One 
1 0 can thus view the linkers and domains together as creating a scaffold on which the 
domains and modules are positioned in the correct orientation to be active. This 
organization and positioning, if retained, permits PKS domains of different or identical 
substrate specificities to be substituted (usually at the DNA level) between PKS enzymes 
by various available methodologies. In selecting the boundaries of, for example, an AT 
1 5 replacement, one can thus make the replacement so as to retain the linkers of the 
recipient PKS or to replace them with the linkers of the donor PKS AT domain, or, 
preferably, make both constructs to ensure that the correct linker regions between the KS 
and AT domains have been included in at least one of the engineered enzymes. Thus, 
there is considerable flexibility in the design of new PKS enzymes with the result that 
20 known polyketides can be produced more effectively, and novel polyketides useful as 
pharmaceuticals or for other purposes can be made. 

By appropriate application of recombinant DNA technology, a wide variety of 
polyketides can be prepared in a variety of different host cells provided one has access to 
nucleic acid compounds that encode PKS proteins and polyketide modification enzymes. 
25 The present invention helps meet the need for such nucleic acid compounds by providing 
recombinant vectors that encode the FK-520 PKS enzyme and various FK-520 
modification enzymes. Moreover, while the FK-506 and FK-520 polyketides have many 
useful activities, there remains a need for compounds with similar useful activities but 
with better pharmacokinetic profile and metabolism and fewer side-effects. The present 
30 invention helps meet the need for such compounds as well. 

Summary of the Invention 
In one embodiment, the present invention provides recombinant DNA vectors 
that encode all or part of the FK-520 PKS enzyme. IUustrative vectors of the invention 
35 include cosmid pKOS034-120, pKOS034-124, pKOS065-C31, pKOS065-C3, 
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pKOS065-M27, and pKOS065-M21. The invention also provides nucleic acid 
compounds that encode the various domains of the FK-520 PKS, i.e., the KS, AT, ACP, 
KR DH and ER domains. These compounds can be readily used, alone or in 
combination with nucleic acids encoding other FK-520 or non-FK-520 PKS domains, as 
5 intermediates in the construction of recombinant vectors that encode all or part of PKS 
enzymes that make novel polyketides. 

The invention also provides isolated nucleic acids that encode all or part of one or 
more modules of the FK-520 PKS, each module comprising a ketosynthase activity, an 
acyl transferase activity, and an acyl carrier protein activity. The invention provides an 
10 isolated nucleic acid that encodes one or more open reading frames of FK-520 PKS 

genes, said open reading frames comprising coding sequences for a CoA ligase activity, 
an NRPS activity, or two or more extender modules. The invention also provides 
recombinant expression vectors containing these nucleic acids. 

In another embodiment, the invention provides isolated nucleic acids that encode 
15 all or a part of a PKS that contains at least one module in which at least one of the 

domains in the module is a domain from a non-FK-520 PKS and at least one domain is 
from the FK-520 PKS. The non-FK-520 PKS domain or module originates from the 
rapamycin PKS, the FK-506 PKS, DEBS, or another PKS. The invention also provides 
recombinant expression vectors containing these nucleic acids. 
20 In another embodiment, the invention provides a method of preparing a 

polyketide, said method comprising transforming a host cell with a recombinant DNA 
vector that encodes at least one module of a PKS, said module comprising at least one 
FK-520 PKS domain, and culturing said host cell under conditions such that ssud PKS is 
produced and catalyzes synthesis of said polyketide. In one aspect, the method is 
25 practiced with a Streptomyces host cell. In another aspect, the polyketide produced ,s 

FK-520. In another aspect, the polyketide produced is a polyketide related in structure to 
FK-520. In another aspect, the polyketide produced is a polyketide related in structure to 

FK-506 or rapamycin. 

In another embodiment, the invention provides a set of genes in recombinant 
30 form sufficient for the synthesis of ethylmalonyl CoA in a heterologous host cell. These 
genes and the methods of the invention enable one to create recombinant host cells with 
the ability to produce polyketides or other compounds that require ethylmalonyl CoA for 
biosynthesis. The invention also provides recombinant nucleic acids that encode AT 
domains specific for ethylmalonyl CoA. Thus, the compounds of the invention can be 
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used to produce polyketides requiring ethylmalonyl CoA in host cells that otherwise are 
unable to produce such polyketides. 

In another embodiment, the invention provides a set of genes in recombinant 
form sufficient for the synthesis of 2-hydroxymalonyl CoA and 2-methoxymalonyl CoA 
5 in a heterologous host cell. These genes and the methods of the invention enable one to 
create recombinant host cells with the ability to produce polyketides or other compounds 
that require 2-hydroxymalonyl CoA for biosynthesis. The invention also provides 
recombinant nucleic acids that encode AT domains specific for 2-hydroxymalonyl CoA . 
and 2-methoxymalonyl CoA. Thus, the compounds of the invention can be used to 
1 0 produce polyketides requiring 2-hydroxymalonyl CoA or 2-methoxymalonyl CoA in 
host cells that are otherwise unable to produce such polyketides. 

In another embodiment, the invention provides a compound related in structure to 
FK-520 or FK-506 that is useful in the treatment of a medical condition. These 
compounds include compounds in which the C-13 methoxy group is replaced by a 
15 moiety selected from the group consisting of hydrogen, methyl, and ethyl moieties. Such 
compounds are less susceptible to the main in vivo pathway of degradation for FK-520 
and FK-506 and related compounds and thus exhibit an improved pharmacokinetic 
profile. The compounds of the invention also include compounds in which the C-15 
methoxy group is replaced by a moiety selected from the group consisting of hydrogen, 
20 methyl, and ethyl moieties. The compounds of the invention also include the above 

compounds further modified by chemical methodology to produce derivatives such as, 
but not limited to, the C-18 hydroxyl derivatives, which have potent neurotrophin bet not 

immunosuppresion activities. 

Thus, me invention provides polyketides having the structure: 



25 



''OH 



wherein, R, is hydrogen, methyl, ethyl, or allyl; R 2 is hydrogen or hydroxyl, provided 
that when R 2 is hydrogen, there isadouble bond between C-20 and C-19;R3 is hydrogen 
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or hydroxyl; R» is methoxyl, hydrogen, methyl, or ethyl; and R s is methoxyl, hydrogen, 
methyl, or ethyl; but not including FK-506, FK-520, 18-hydroxy-FK-520, and 18- 
hydroxy-FK-506. The invention provides these compounds in purified form and in 
pharmaceutical compositions. 

5 In another embodiment, the invention provides a method for treating a medical 

condition by administering a pharmaceutically efficacious dose of a compound of the 
invention. The compounds of the invention may be administered to achieve 
immunosuppression or to stimulate nerve growth and regeneration. 

These and other embodiments and aspects of the invention will be more fully 

1 0 understood after consideration of the attached Drawings and their brief description 
below, together with the detailed description, examples, and claims that follow. 

Brief Description of the Drawings 
Figure 1 shows a diagram of the FK-520 biosynthetic gene cluster. The top line 
1 5 provides a scale in kilobase pairs (kb). The second line shows a restriction map with 
selected restriction enzyme recognition sequences indicated. K is Kpnl;.X isXhol, S is 
Sad; P is Pstl; and E is EcoKl. The third line indicates the position of FK-520 PKS and 
related genes. Genes are abbreviated with a one letter designation, i.e., C is JkbC. 
Immediately under the third line are numbered segments showing where the loading 
20 module (L) and ten different extender modules (numbered 1 - 10) are encoded on the 

various genes shown. At the bottom of the Figure, the DNA inserts of various cosmids of 
the invention (i.e., 34-124 is cosmid pKOS034-124) are shown in alignment with the 
FK-520 biosynthetic gene cluster. 

Figure 2 shows the loading module (load), the ten extender modules, and the 
25 peptide synthetase domain of the FK-520 PKS, together with, on the top line, the genes 
that encode the various domains and modules. Also shown are the various intermediates 
in FK-520 biosynthesis, as well as the structure of FK-520, with carbons 13, 15, 21, and 
31 numbered. The various domains of each module and subdomains of the loading 
module are also shown. The darkened circles showing the DH domains in modules 2, 3, 
30 and 4 indicate that the dehydratase domain is not functional as a dehydratase; this 

domain may affect the stereochemistry at the corresponding position in the polyketide. 
The substituents on the FK-520 structure that result from the action of non-PKS enzymes 
are also indicated by arrows, together with the types of enzymes or the genes that code 
for the enzymes that mediate the action. Although the methyltransferase is shown acting 
35 at the C-13 and C-15 hydroxyl groups after release of the polyketide from the PKS, the 

8 
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methyltransferase may act on the 2-hydroxymalonyl substrate prior to or 
contemporaneously with its incorporation during polyketide synthesis. 

Figure 3 shows a close-up view of the left end of the FK-520 gene cluster, which 
contains at least ten additional genes. The ethyl side chain on carbon 21 of FK-520 
5 (Figure 2) is derived from an ethylmalonyl CoA extender unit that is incorporated by an 
ethylmalonyl specific AT domain in extender module 4 of the PKS. At least four of the 
genes in this region code for enzymes involved in ethylmalonyl biosynthesis. The 
polyhydroxybutyrate depolymerase is involved in maintaining hydroxybutyryl-CoA 
pools during FK-520 production. Polyhydroxybutyrate accumulates during vegetative 
1 0 growth and disappears during stationary phase in other Streptomyces (Ranade and 

Vining, 1993, Can. J. Microbiol 39:377). Open reading frames with unknown function 
are indicated with a question mark. 

Figure 4 shows a biosynthetic pathway for the biosynthesis of ethylmalonyl CoA 
from acetoacetyl CoA consistent with the function assigned to four of the genes in the 
15 FK-520 gene cluster shown in Figure 3. 

Figure 5 shows a close-up view of the right-end of the FK-520 PKS gene cluster 
(and of the sequences on cosmid pKOS065-C31). The genes shown include flcbD.flcbM 
(a methyl transferase that methylates the hydroxyl group on C-31 of FK-520),yfrWV (a 
homolog of a gene described as a regulator of cholesterol oxidase and that is believed to 
20 be a transcriptional activator), JkbQ (a type II thioesterase, which can increase polyketide 
production levels), andjkbS (a crotonyl-CoA reductase involved in the biosynthesis of 
ethylmalonyl CoA). 

Figure 6 shows the proposed degradative pathway for tacrolimus (FK-506) 

metabolism. 

25 Figure 7 shows a schematic process for the construction of recombinant PKS 

genes of the invention that encode PKS enzymes that produce 13-desmethoxy FK-506 
and FK-520 polyketides of the invention, as described in Example 4, below, 

Figure 8, in Parts A and B, shows certain compounds of the invention preferred 
for dermal application in Part A and a synthetic route for making those compounds in 

30 Part B. . _. 

Detailed Description of the Invention 
Given the valuable pharmaceutical properties of polyketides, there is a need for 
methods and reagents for producing large quantities of polyketides, as well as for 
35 producing related compounds not found in nature. The present invention provides such 
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methods and reagents, with particular application to methods and reagents for producing 
the polyketides known as FK-520, also known as ascomycin or L-683,590 (see Holt et 
al. 9 1993, JACS 775:9925), and FK-506, also known as tacrolimus. Tacrolimus is a 
macrolide immunosuppressant used to prevent or treat rejection of transplanted heart, 
5 kidney, liver, lung, pancreas, and small bowel allografts. The drug is also useful for the 
prevention and treatment of graft-versus-host disease in patients receiving bone marrow 
transplants, and for the treatment of severe, refractory uveitis. There have been additional 
reports of the unapproved use of tacrolimus for other conditions, including alopecia 
universalis, autoimmune chronic active hepatitis, inflammatory bowel disease, multiple 
10 sclerosis, primary biliary cirrhosis, and scleroderma. The invention provides methods 
and reagents for making novel polyketides related in structure to FK-520 and FK-506, 
and structurally related polyketides such as rapamycin. 

The FK-506 and rapamycin polyketides are potent immunosuppressants, with 
chemical structures shown below. 




FK-506 Rapamycin 

15 

FK-520 differs from FK-506 in that it lacks the allyl group at C-21 of FK-506, having 
instead an ethyl group at that position, and has similar activity to FK-506, albeit reduced 
immunosuppressive activity. 

These compounds act through initial formation of an intermediate complex with 

20 protein "immunophilins" known as FKBPs (FK-506 binding proteins), including FKBP- 
12. Immunophilins are a class of cytosolic proteins that form complexes with molecules 
such as FK-506, FK-520, and rapamycin that in turn serve as ligands for other cellular 
targets involved in signal transduction. Binding of FK-506, FK-520, and rapamycin to 
FKBP occurs through the structurally similar segments of the polyketide molecules, 

25 known as the "FKBP-binding domain" (as generally but not precisely indicated by the 

10 



NSDOCID: < WO 002060 1 A2_l__> 



WO 00/20601 PCT/US99/22886 
stippled regions in the structures above). The FK-506-FKBP complex then binds 
calcineurin, while the rapamycin-FKBP complex binds to a protein known as RAFT-1. 
Binding of the FKBP-polyketide complex to these second proteins occurs through the 
dissimilar regions of the drugs known as the "effector" domains. 



5 
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The three component FKBP-polyketide-effector complex is required for signal 
transduction and subsequent immunosuppressive activity of FK-506, FK-520, and 
rapamycin. Modifications in the effector domains of FK-506, FK-520, and rapamycin 
that destroy binding to the effector proteins (calcineurin or RAFT) lead to loss of 

10 immunosuppressive activity, even though FKBP binding is unaffected. Further, such 
analogs antagonize the immunosuppressive effects of the parent polyketides, because 
they compete for FKBP. Such non-immunosuppressive analogs also show reduced 
toxicity (see Dumont et aL, 1992, Journal of Experimental Medicine 176, 751-760), 
indicating that much of the toxicity of these drugs is not linked to FKBP binding. 

1 5 In addition to immunosuppressive activity, FK-520, FK-506, and rapamycin have 

neurotrophic activity. In the central nervous system and in peripheral nerves, 
immunophilins are referred to as "neuroimmunophilins". The neuroimmunophilin FKBP 
is markedly enriched in the central nervous system arid in peripheral nerves. Molecules 
that bind to the neuroimmunophilin FKBP, such as FK-506 and FK-520, have the 

20 remarkable effect of stimulating nerve growth. In vitro* they act as neurotrophins, i.e., 
they promote neurite outgrowth in NGF-treated PC12 cells and in sensory neuronal 
cultures, and in intact animals, they promote regrowth of damaged facial and sciatic 
nerves, and repair lesioned serotonin and dopamine neurons in the brain. See Gold et al. 9 
Jun. 1999, J Pharm. Exp. Ther. 289(3): 1202-1210; Lyons et aL, 1994, Proc. National 

25 Academy of Science 91: 3191-3195; Gold et a/., 1995, Journal ofNeuroscience 15: 

11 
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7509-7516; and Steiner et aL, 1997, Proc. National Academy of Science 94: 2019-2024. 
Further, the restored central and peripheral neurons appear to be functional. 

Compared to protein neurotrophic molecules (BNDF, NGF, etc.), the small- 
molecule neurotrophins such as FK-506, FK-520, and rapamycin have different, and 
5 often advantageous, properties. First, whereas protein neurotrophins are difficult to 

deliver to their intended site of action and may require intra-cranial injection, the small- 
molecule neurotrophins display excellent bioavailability; they are active when 
administered subcutaneously and orally. Second, whereas protein neurotrophins show 
quite specific effects, the small-molecule neurotrophins show rather broad effects. 
10 Finally, whereas protein neurotrophins often show effects on normal sensory nerves, the 
small-molecule neurotrophins do not induce aberrant sprouting of normal neuronal 
processes and seem to affect damaged nerves specifically. Neuroimmunophilin ligands 
have potential therapeutic utility in a variety of disorders involving nerve degeneration 
(e.g. multiple sclerosis, Parkinson's disease, Alzheimer's disease, stroke, traumatic 
15 spinal cord and brain injury, peripheral neuropathies). 

Recent studies have shown that the immunosuppressive and neurite outgrowth 
-. activity of FK-506, FK-520, and rapamycin can be separated; the neuroregenerative 
activity in the absence of immunosuppressive activity is retained by agents which bind to 
FKBP but not to the effector proteins calcineurin or RAFT. See Steiner et al. y 1997, 
20 Nature Medicine 3: 421-428. 




^ Nerve Regeneration 



Available structure-activity data show that the important features for neurotrophic 
activity of rapamycin, FK-520, and FK-506 lie within the common, contiguous segments 
of the macrolide ring that bind to FKBP. This portion of the molecule is termed the 
25 "FKBP binding domain" (see VanDuyne et aL, 1 993, Journal of Molecular Biology 229: 
105-124.). Nevertheless, the effector domains of the parent macrolides contribute to 
conformational rigidity of the binding domain and thus indirectly contribute to FKBP 
binding. 
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"FKBP binding domain" 

There are a number of other reported analogs of FK-506, FK-520, and rapamycin that 
bind to FKBP but not the effector protein calcineurin or RAFT. These analogs show 
effects on nerve regeneration without immunosuppressive effects. 
5 Naturally occurring FK-520 and FK-506 analogs include the antascomycins, 

which are FK-506-like macrolides that lack the functional groups of FK-506 that bind to 
calcineurin (see Fehr et al. 9 1996, The Journal of Antibiotics 49: 230-233). These 
molecules bind FKBP as effectively as does FK-506; they antagonize the effects of both 
FK-506 and rapamycin, yet lack immunosuppressive activity. 




^ Antascomycin A 

Other analogs can be produced by chemically modifying FK-506, FK-520, or 
rapamycin. One approach to obtaining neuroimmunophilin ligands is to destroy the 
effector binding region of FK-506, FK-520, or rapamycin by chemical modification. 
While the chemical modifications permitted on the parent compounds are quite limited, 
15 some useful chemically modified analogs exist. The FK-520 analog L-685,818 (ED 50 = 
0.7 nM for FKBP binding; see Dumont et al. 9 1992), and the rapamycin analog WAY- 
124,466 (IC 50 = 12.5 nM; see Ocain et a/., 1993, Biochemistry Biophysical Research 
Communications 192: 1340-134693) are about as effective as FK-506, FK-520, and 
rapamycin at promoting neurite outgrowth in sensory neurons (see Steiner et aL, 1997). 

13 
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L-685,818 



WAY-1 24,466 

One of the few positions of rapamycin that is readily amenable to chemical 
modification is the allylic 16-methoxy group; this reactive group is readily exchanged by 
acid-catalyzed nucleophilic substitution. Replacement of the 16-methoxy group of 
rapamycin with a variety of bulky groups has produced analogs showing selective loss of 
immunosuppressive activity while retaining FKBP-binding (see Luengo et al., 1995, 
Chemistry & Biology 2: 471-481). One of the best compounds, 1, below, shows complete 
loss of activity in the splenocyte proliferation assay with only a 10-fold reduction in 
binding to FKBP 



10 
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There are also synthetic analogs of FKBP binding domains. These compounds 
reflect an approach to obtaining neuroimmunophilin ligands based on "rationally 
designed" molecules that retain the FKBP-binding region in an appropriate conformation 
15 for binding to FKBP, but do not possess the effector binding regions. In one example, the 
ends of the FKBP binding domain were tethered by hydrocarbon chains (see Holt et a/., 
1993, Journal of the American Chemical Society 115: 9925-9938); the best analog, 2, 
below, binds to FKBP about as well as FK-506. In a similar approach, the ends of the 
FKBP binding domain were tethered by a tripeptide to give analog 3, below, which binds 
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to FKBP about 20-fold poorer than FK-506. These compounds are anticipated to have 
neuroimmunophilin binding activity. 




2 3 



5 In a primate MPTP model of Parkinson's disease, administration of FKBP ligand 

GPI-1046 caused brain cells to regenerate and behavioral measures to improve. MPTP is 
a neurotoxin, which, when administered to animals, selectively damages nigral-striatal 
dopamine neurons in the brain, mimicking the damage caused by Parkinson's disease. 
Whereas, before treatment, animals were unable to use affected limbs, the FKBP ligand 
10 restored the ability of animals to feed themselves and gave improvements in measures of 
locomotor activity, neurological outcome, and fine motor control. There were also 
corresponding increases in regrowth of damaged nerve terminals. These results 
demonstrate the utility of FKBP ligands for treatment of diseases of the CNS. 

From the above description, two general approaches towards the design of non- 
15 immunosuppressant, neuroimmunophilin ligands can be seen. The first involves the 

construction of constrained cyclic analogs of FK-506 in which the FKBP binding domain 
is fixed in a conformation optimal for binding to FKBP. The advantages of this approach 
are that the conformation of the analogs can be accurately modeled and predicted by 
computational methods, and the analogs closely resemble parent molecules that have 
20 proven pharmacological properties. A disadvantage is that the difficult chemistry limits 
the numbers and types of compounds that can be prepared. The second approach 
involves the trial and error construction of acyclic analogs of the FKBP binding domain 
by conventional medicinal chemistry. The advantages to this approach are that the 
chemistry is suitable for production of the numerous compounds needed for such 
25 interactive chemistry-bioassay approaches. The disadvantages are that the molecular 
types of compounds that have emerged have no known history of appropriate 
pharmacological properties, have rather labile ester functional groups, and are too 
conformationally mobile to allow accurate prediction of conformation^ properties. 

Thepresent invention provides useful methods and reagents related to the first 
30 approach, but with significant advantages. The invention provides recombinant PKS 

15 
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genes that produce a wide variety of polyketides that cannot otherwise be readily 
synthesized by chemical methodology alone. Moreover, the present invention provides 
polyketides that have either or both of the desired immunosuppressive and neurotrophic 
activities, some of which are produced only by fermentation and others of which are 
5 produced by fermentation and chemical modification. Thus, in one aspect, the invention 
provides compounds that optimally bind to FKBP but do not bind to the effector 
proteins. The methods and reagents of the invention can be used to prepare numerous 
constrained cyclic analogs of FK-520 in which the FKBP binding domain is fixed in a 
conformation optimal for binding to FKBP. Such compounds will show 

10 neuroimmunophilin binding (neurotrophic) but not immunosuppressive effects. The 

invention also allows direct manipulation of FK-520 and related chemical structures via 
genetic engineering of the enzymes involved in the biosynthesis of FK-520 (as well as 
related compounds, such as FK-506 and rapamycin); similar chemical modifications are 
simply not possible because of the complexity of the structures. The invention can also 

15 be used to introduce "chemical handles" into normally inert positions that permit 
subsequent chemical modifications. 

Several general approaches to achieve the development of novel 
neuroimmunophilin ligands are facilitated by the methods and reagents of the present 
invention. One approach is to make "point mutations" of the functional groups of the 

20 parent FK-520 structure that bind to the effector molecules to eliminate their binding 
potential. These types of structural modifications are difficult to perform by chemical 
modification, but can be readily accomplished with the methods and reagents of the 
invention. 

A second, more extensive approach facilitated by the present invention is to 
25 utilize molecular modeling to predict optimal structures ab initio that bind to FKBP but 
not effector molecules. Using the available X-ray crystal structure of FK-520 (or FK- 
506) bound to FKBP, molecular modeling can be used to predict polyketides that should 
optimally bind to FKBP but not calcineurin. Various macrolide structures can be 
generated by linking the ends of the FKBP-binding domain with "all possible" 
30 polyketide chains of variable length and substitution patterns that can be prepared by 

genetic manipulation of the FK-520 or FK-506 PKS gene cluster in accordance with the 
methods of the invention. The ground state conformations of the virtual library can be 
determined, and compounds that possess binding domains most likely to bind well to 
FKBP can be prepared and tested. 

16 
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Once a compound is identified in accordance with the above approaches, the 
invention can be used to generate a focused library of analogs around the lead candidate, 
to "fine tune" the compound for optimal properties. Finally, the genetic engineering 
methods of the invention can be directed towards producing "chemical handles" that 
5 enable medicinal chemists to modify positions of the molecule previously inert to 
chemical modification. This opens the path to previously prohibited chemical 
optimization of lead compounds by time-proven approaches. 

Moreover, the present invention provides polyketide compounds and the 
recombinant genes for the PKS enzymes that produce the compounds that have 

10 significant advantages over FK-506 and FK-520 and their analogs. The metabolism and 
pharmacokinetics of tacrolimus has been exstensively studied, and FK-520 is believed to 
be similar in these respects. Absorption of tacrolimus is rapid, variable, and incomplete 
from the gastrointestinal tract (Harrison's Principles of Internal Medicine, 14th edition, 
1998, McGraw Hill, 14, 20, 21, 64-67). The mean bioavailability of the oral dosage form 

1 5 is 27%, (range 5 to 65%). The volume of distribution (VolD) based on plasma is 5 to 65 
L per kg of body weight (L/kg), and is much higher than the VolD based on whole blood 
concentrations, the difference reflecting the binding of tacrolimus to red blood cells. 
Whole blood concentrations may be 12 to 67 times the plasma concentrations. Protein 
binding is high (75 to 99%), primarily to albumin and alphal-acid glycoprotein. The 

20 half-life for distribution is 0.9 hour; elimination is biphasic and variable: terminal-1 1 .3 hr 
(range, 3.5 to 40.5 hours). The time to peak concentration is 0.5 to 4 hours after oral 
administration. 

Tacrolimus is metabolized primarily by cytochrome P450 3A enzymes in the 
liver and small intestine. The drug is extensively metabolized with less than 1% excreted 

25 unchanged in urine. Because hepatic dysfunction decreases clearance of tacrolimus, 
doses have to be reduced substantially in primary graft non-function, especially in 
children. In addition, drugs that induce the cytochrome P450 3 A enzymes reduce 
tacrolimus levels, while drugs that inhibit these P450s increase tacrolimus levels. 
Tacrolimus bioavailability doubles with co-administration of ketocdnazole, a drug that 

30 inhibits P450 3A. See, Vincent et al, 1992, In vitro metabolism of FK-506 in rat, rabbit, 
and human liver microsomes: Identification of a major metabolite and of cytochrome 
P450 3 A as the major enzymes responsible for its metabolism, Arch. Biochem. Biophys. 
294: 454-460; Iwasaki et al., 1993, Isolation, identification, and biological activities of 
oxidative metabolites of FK-506, a potent immunosuppressive macrolide lactone, Drug 

35 Metabolism & Disposition 21; 971-977; Shiraga et al 9 1994, Metabolism of FK-506, a - 
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potent immunosuppressive agent, by cytochrome P450 3A enzymes in rat, dog, and 
human liver microsomes, Biochem. Pharmacol 47: 727-735; and Iwasaki et al, 1995, 
Further metabolism of FK-506 (Tacrolimus); Identification and biological activities of 
the metabolites oxidized at multiple sites of FK-506, Drug Metabolism & Disposition 23: 
5 28-34. The cytochrome P450 3 A subfamily of isozymes has been implicated as 
important in this degradative process. 

Structures of the eight isolated metabolites formed by liver microsomes are 
shown in Figure 6. Four metabolites of FK-506 involve demethylation of the oxygens on 
carbons 13, 15, and 31, and hydroxylation of carbon 12. The 13-demethylated (hydroxy) 

10 compounds undergo cyclizations of the 13-hydroxy at C-10 to give MI, MVI and MVII, 
and the 12-hydroxy metabolite at C-10 to give I. Another four metabolites formed by 
oxidation of the four metabolites mentioned above were isolated by liver microsomes 
from dexamethasone treated rats. Three of these are metabolites doubly demethylated at 
the methoxy groups on carbons 15 and 31 (M-V), 13 and 31 (M-VI), and 13 and 15 (M- 

15 VII). The fourth, M-Vm, was the metabolite produced after demethylation of the 31- 
methoxy group, followed by formation of a fused ring system by further oxidation. 
Among the eight metabolites, M-II has immunosuppressive activity comparable to that 
of FK-506, whereas the other metabolites exhibit weak or negligible activities. 
Importantly, the major metabolite of human, dog, and rat liver microsomes is the 13- 

20 demethylated and cyclized FK-506 (M-I). 

Thus, the major metabolism of FK-506 proceeds via 13-demethylation followed 
by cyclization to the inactive M-I, this representing about 90% of the metabolic products 
after a 10 minute incubation with liver microsomes. Analogs of tacrolimus that do not 
possess a C-13 methoxy group would not be susceptible to the first and most important 

25 biotransformation in the destructive metabolism of tacrolimus (i.e. cyclization of 13- 
hydroxy to C-10). Thus, a 13-desmethoxy analog of FK-506 should have a longer half- 
life in the body than does FK-506. The C-13 methoxy group is believed not to be 
required for binding to FKBP or calcineurin. The C-13 methoxy is not present on the 
identical position of rapamycin, which binds to FKBP with equipotent affinity as 

30 tacrolimus. Also, analysis of the 3-dimensional structure of the FKBP-tacrolimus- 
calcineurin complex shows that the C-13 methoxy has no interaction with FKBP and 
only a minor interaction with calcineurin. The present invention provides C-13- - 
desmethoxy analogs of FK-506 and FK-520, as well as the recombinant genes that 
encode the PKS enzymes that catalyze their synthesis and host cells that produce the 

35 compounds. 
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These compounds exhibit, relative to their naturally occurring counterparts, 
prolonged immunosuppressive action in vivo, thereby allowing a lower dosage and/or 
reduced frequency of administration. Dosing is more predictable, because the variability 
in FK-506 dosage is largely due to variation of metabolism rate. FK-506 levels in blood 
can vary widely depending on interactions with drugs that induce or inhibit cytochrome 
P450 3 A (summarized in USP Drug Information for the Health Care Professional). Of 
particular importance are the numerous drugs that inhibit or compete for CYP 3 A, 
because they increase FK-506 blood levels and lead to toxicity (Prograf package insert, 
FujisawadUS, Rev 4/97, Rec 6/97). Also important are the drugs that induce P450 3A 
(e.g. Dexamethasone), because they decrease FK-506 blood levels and reduce efficacy. 
Because the major site of CYP 3 A action on FK-506 is removed in the analogs provided 
by the present invention, those analogs are not as susceptible to drug interactions as the 
naturally occurring compounds. 

Hyperglycemia, nephrotoxicity, and neurotoxicity are the most significant 
adverse effects resulting from the use of FK-506 and are believed to be similar for FK- 
520. Because these effects appear to occur primarily by the same mechanism as the 
immunosuppressive action (i.e. FKBP-calcineurin interaction), the intrinsic toxicity of 
the desmethoxy analogs may be similar to FK-506. However, toxicity of FK-506 is dose 
related and correlates with high blood levels of the drug (Prograf package insert, 
FujisawaDUS, Rev 4/97, Rec 6/97). Because the levels of the compounds provided by 
the present invention should be more controllable, the incidence of toxicity should be 
significantly decreased with the 13-desmethoxy analogs. Some reports show that certain 
FK-506 metabolites are more toxic than FK-506 itself, and this provides an additional 
reason to expect that a CYP 3A resistant analog can have lower toxicity and a higher 
therapeutic index. 

Thus, the present invention provides novel compounds related in structure to FK- 
506 and FK-520 but with improved properties. The invention also provides methods for 
making these compounds by fermentation of recombinant host cells, as well as the 
recombinant host cells, the recombinant vectors in those host cells, and the recombinant 
proteins encoded by those vectors. The present invention also provides other valuable 
materials useful in the construction of these recombinant vectors that have many other 
important applications as well. In particular, the present invention provides the FK-520 
PKS genes, as well as certain genes involved in the biosynthesis of FK-520 in 
recombinant form. 
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FK-520 is produced at relatively low levels in the naturally occurring cells, 
Streptomyces hygroscopicus van ascomyceticus, in which it was first identified. Thus, 
another benefit provided by the recombinant FK-520 PKS and related genes of the 
present invention is the ability to produce FK-520 in greater quantities in the 
recombinant host cells provided by the invention. The invention also provides methods 
for making novel FK-520 analogs, in addition to the desmethoxy analogs described 
above, and derivatives in recombinant host cells of any origin. 

The biosynthesis of FK-520 involves the action of several enzymes. The FK-520 
PKS enzyme, which is composed of the JkbA,fkbB,fkbC 9 and JkbP gene products, 
synthesizes the core structure of the molecule. There is also a hydroxylation at C-9 
mediated by the P450 hydroxylase that is the jkbD gene product and that is oxidized by 
the fkbO gene product to result in the formation of a keto group at C-9. There is also a 
methylation at C-31 that is mediated by an O-methyltransferase that is the fkbM gene 
product. There are also methylations at the C-13 and C-15 positions by a 
methyltransferase believed to be encoded by the fkbG gene; this methyltransferase may 
act on the hydroxymalonyl CoA substrates prior to binding of the substrate to the AT 
domains of the PKS during polyketide synthesis. The present invention provides the 
genes encoding these enzymes in recombinant form. The invention also provides the 
genes encoding the enzymes involved in ethylmalonyl CoA and 2-hydroxymalonyl CoA 
biosynthesis in recombinant form. Moreover, the invention provides Streptomyces 
hygroscopicus var. as corny ceticus recombinant host cells lacking one or more of these 
genes that are useful in the production of useful compounds. 

The cells are useful in production in a variety of ways. First, certain cells make a 
useful FK-520-related compound merely as a result of inacti vation of one or more of the 
FK-520 biosynthesis genes. Thus, by inactivating the C-31 O-methyltransferase gene in 
Streptomyces hygroscopicus var. ascomy ceticus y one creates a host cell that makes a 
desmethyl (at C-31) derivative of FK-520. Second, other cells of the invention are unable 
to make FK-520 or FK-520 related compounds due to an inacti vation of one or more of 
the PKS genes. These cells are useful in the production of other polyketides produced by 
PKS enzymes that are encoded on recombinant expression vectors and introduced into - 
the host cell. 

Moreover, if only one PKS gene is inactivated, the ability to produce FK-520 or 
an FK-520 derivative compound is restored by introduction of a recombinant expression 
vector that contains the functional gene in a modified or unmodified form. The 
introduced gene produces a gene product that, together with the other endogenous and 

20 
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functional gene products, produces the desired compound. This methodology enables 
one to produce FK-520 derivative compounds without requiring that all of the genes for 
the PKS enzyme be present on one or more expression vectors. Additional applications 
and benefits of such cells and methodology will be readily apparent to those of skill in 
5 the art after consideration of how the recombinant genes were isolated and employed in 
the construction of the compounds of the invention. 

The FK-520 biosynthetic genes were isolated by the following procedure. 
Genomic DNA was isolated from Streptomyces hygroscopicus var. ascomyceticus 
(ATCC 14891) using the lysozyme/proteinase K protocol described in Genetic 

10 Manipulation of Streptomyces - A Laboratory Manual (Hopwood et al. 9 1986). The 

average size of the DNA was estimated to be between 80 - 120 kb by electrophoresis on 
0.3% agarose gels. A library was constructed in the SuperCos™ vector according to the 
manufacturer's instructions and with the reagents provided in the commercially available 
kit (Stratagene). Briefly, 100 jig of genomic DNA was partially digested with 4 units of 

1 5 Sau3A I for 20 min. in a reaction volume of 1 mL, and the fragments were 

dephosphorylated and ligated to SuperCos vector arms. The Iigated DNA was packaged 
and used to infect log-stage XLl-BlueMR cells. A library of about 10,000 independent 
cosmid clones was obtained. 

Based on recently published sequence from the FK-506 cluster (Motamedi and 

20 Shafiee, 1998, Eur. J. Biochem. 256: 528), a probe for the JkbO gene was isolated from 
ATCC 14891 using PCR with degenerate primers. With this probe, a cosmid designated 
pKOS034-124 was isolated from the library. With probes made from the ends of cosmid 
pKOS034-124, an additional cosmid designated pKOS034-120 was isolated. These 
cosmids (pKOS034-124 and pKOS034-120) were shown to contain DNA inserts thrt 

25 overlap with one another. Initial sequence data from these two cosmids generated 

sequences similar to sequences from the FK-506 and rapamycin clusters, indicating that 
the inserts were from the FK-520 PKS gene cluster. Two EcoHl fragments were 
subcloned from cosmids pKOS034-124 and pKOS034-120. These subclones were used 
to prepare shotgun libraries by partial digestion with Sau3AI, gel purification of 

30 fragments between 1.5 kb and 3 kb in size, and ligation into the pLitmiis28 vector (New 
England Biolabs). These libraries were sequenced using dye terminators on a Beckmann 
CEQ2000 capillary electrophoresis sequencer, according to the manufacturer's protocols. 

To obtain cosmids containing sequence on the left and right sides of the 
sequenced region described above, a new cosmid library of ATCC 14891 DNA was 

35 prepared essentially as described above. This new library was screened with a new fkbM 
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probe isolated using DNA from ATCC 14891. A probe representing the JkbP gene at the 
end of cosmid pKOS034-124 was also used. Several additional cosmids to the right of 
the previously sequenced region were identified. Cosmids pKOS065-C31 and pKOS065- 
C3 were identified and then mapped with restriction enzymes. Initial sequences from 
5 these cosmids were consistent with the expected organization of the cluster in this 

region. More extensive sequencing showed that both cosmids contained in addition to the 
desired sequences, other sequences not contiguous to the desired sequences on the host 
cell chromosomal DNA. Probing of additional cosmid libraries identified two additional 
cosmids, pKOS065-M27 and pKOS065-M21, that contained the desired sequences in a 

1 0 contiguous segment of chromosomal DNA. Cosmids pKOS034- 1 24, pKOS034- 1 20, 
pKOS065-M27, and pKOS065-M21 have been deposited with the American Type 
Culture Collection, Manassas, VA, USA. The complete nucleotide sequence of the 
coding sequences of the genes that encode the proteins of the FK-520 PKS are shown 
below but can also be determined from the cosmids of the invention deposited with the 

15 ATCC using standard methodology. 

Referring to Figures 1 and 3, the FK-520 PKS gene cluster is composed of four 
open reading frames designated jkbB,jkbC>fkbA 9 and jkbP. The JkbB open reading frame 
encodes the loading module and the first four extender modules of the PKS. The fkbC 
open reading frame encodes extender modules five and six of the PKS. The JkbA open 

20 reading frame encodes extender modules seven, eight, nine, and ten of the PKS. The 
fkbP open reading frame encodes the NRPS of the PKS. Each of these genes can be 
isolated from the cosmids of the invention described above. The DNA sequences of these 
genes are provided below preceded by the following table identifying the start and stop 
codons of the open reading frames of each gene and the modules and domains contained 

25 therein. 



Nucleotides Gene or Domain 

complement (412 - 1836) fkbW 

complement (2020-3579) fkbV 

30 complement (3969 - 4496) JkbR2 

complement (4595 - 5488) jkbRl 

5601 -6818 flcbE 

6808-8052__ flcbF 

8156-8824 JkbG 

35 complement (9122 - 9883) flcbH 

complement (9894 - 10994) flcbl 

complement (10987- 11247) JkbJ 

complement (1 1244 - 12092) fkbK 

complement (121 13 - 13150) flcbL 

40 complement (13212 - 23988) JkbC 
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complement (23992 


- 46573) 


JkbB 




46754 - 47788 




JkbO 




47785-52272 




flcbP 




52275 - 71465 




flcbA 


5 


71462-72628 




JkbD 




72625 - 73407 




jkbM 




complement (73460 


- 76202) 


fkbN 




complement (76336 


- 77080) 


JkbQ 




complement (77076 


- 77535) 


JkbS 


10 


complement (44974 


- 46573) 


CoA ligase of loading domain 




complement (43777 


- 44629) 


ER of loading domain 




complement (43 1 44 


- 43660) 


ACP of loading domain 




complement (41842 


- 43093) 


KS of extender module 1 (KS1) 




complernent(40609 - 


41842) 


ATI 


15 


complement (39442 


- 40609) 


DH1 




complement (38677 


- 39307) 


KR1 




complement (38371 


- 38581) 


ACPI 




complement (37145 


- 38296) 


KS2 




complement (35749 


- 37144) 


AT2 


20 


complement (34606 


- 35749) 


DH2 (inactive) 




complement (33823 


- 34480) 


KR2 




complement (33505 


-33715) 


ACP2 




complement (32185 


- 33439) 


KS3 




complement (31018 


- 32185) 


AT3 


25 


complement (29869 


-31018) 


DH3 (inactive) 




complement (29092 


- 29740) 


KR3 




complement (28750 


- 28960) 


ACP3 




complement (27430 


- 28684) 


KS4 




complement (26146 


- 27430) 


AT4 


30 


complement (24997 


- 26146) 


DH4 (inactive) 




complement (24163 


-24373) 


ACP4 




complement (22653 


- 23892) 


KS5 




complement (21420 


- 22653) 


AT5 




complement (20241 


- 21420) 


DH5 


35 


complement ( 1 9464 


- 20097) 


KR5 




complement (19116 


- 19326) 


ACP5 




complement (17820 


- 19053) 


KS6 




complement (16587 


- 17820) 


AT6 




complement (15438 


-16587) 


DH6 


40 


complement ( 1 45 1 7 


- 15294) 


ER6 




complement (13761 


- 14394) 


KR6 




complement (13452 


- 13662) 


ACP6 




52362-53576 




KS7 




53577 - 54716 




AT7 


45 


54717 - 55871 




DH7 




56019-56819 




ER7 




56943 - 57575 





KR7 




—57710 - 57920 




ACP7 




57990 - 59243 




KS8 


50 


59244 - 60398 




AT8 




60399 - 61412 




DH8 (inactive) 




.61548 - 62180 




KR8 
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62328 - 62537 ACP8 

62598 - 63854 KS9 

63855-65084 AT9 

65085 - 66254 DH9 

5 .66399-67175 ER9 

67299-67931 KR9 

68094 - 68303 ACP9 

68397-69653 KS10 

69654-70985 AT10 

10 71064-71273 ACP10 



\ GATCTCAGGC ATGAAGTCCT CCAGGCGAGG CGCCGAGGTG GTGAACACCT CGCCGCTGCT 
61 TGTACGGACC ACTTCAGTCA GCGGCGATTG CGGAACCAAG TCATCCGGAA TAAAGGGCGG 
121 TTACAAGATC CTCACATTGC GCGACCGCCA GCATACGCTG AGTTGCCTCA GAGGCAAACC 
15 181 GAAAGGGCGC GGGCGGTCCG CACCAGGGCG GAGTACGCGA CGAGAGTGGC GCACCCGCGC 

241 ACCGTCACCT CTCTCCCCCG CCGGCGGGAT GCCCGGCGTG ACACGGTTGG GCTCTCCTCG 
301 ACGCTGAACA CCCGCGCGGT GTGGCGTCGG GGACACCGCC TGGCATCGGC CGGGTGACGG 
361 TACGGGGAGG GCGTACGGCG GCCGTGGCTC GTGCTCACGG CCGCCGGGCG GTCATCCGTC 
421 GAGACGGCAC TCGGCGAGCA GGGACGCCTG GTCGGCACCT GCGGGCCGGA CGACCGTGTG 
20 481 GTTCGCGGGC GGGCGGTGGC CGGTGGTGAG CCAGCTCTCC AGGGCGGTGA AGGCTGAGCG 

541 GTGACACGGC AGCAAAGGCC GGAGTCGGTC GGGGAAGGTG TCGACGAGGG CGTCGGTGTG 
601 CGTGCCGTCC TCGATGCGGT AGTAGCGGTA CCGGCCGCCA GGCCGCTGCC GGACATACGC 
661 GCGTACACGT CGGAGCCCGG GCGGCAGGCA GCAGCACGTC GAGAGTGCCT GGATGGTGAT 
721 CAGCGGCTTG CCGATACGAC CGGTCAACGC GATGCGTTCC ACGGCCGCGT GGACGCJGGA 
25 781 GGAGCGGGTG GCGTAGTCGT AGTCGGCATC GCAGCCCGGG ACCGTCCCCG GGGCGCAATA 

841 CGGTGTGCCG GCTTCCTTCT CCCCATCGAA GCCGGGGTCG AACTCCTCGC GGTAGACGCG 
901 CTGCGTCAGA TCCCAGTAGA CCTCGTGGTG GTACGGCCAC AAGAACTCGG AGTCGGCCGG 
961 GAACCCGGCG CGGAGCAGCG CCTCGCGCGC CTGGCCGGCT GCGGGGCCGC CTGCCGCGTA 
1021 GGTGGGGTAG TCGCGCAGGG CGGCCGGCAG GAAGGTGAAG AGGTTGGGAC CCTCCGCGCG 
30 1081 CCACAGGGTG CCTTCCCAGT CGACTCCTCC GTCGTACAGC TCGGGATGGT TCTCCAGCTG 

1141 CCAGCGCACG AGGTAGCCGC CGTTGGACAT CCCGGTGACC AGGGTGCGCT CGAGCGGCCG 
1201 GTGGTAGCGC TGGGCGACCG ACGCGCGGGC GGCCCGGGTC AGCTGGGTGA GGCGGGTGTT 
1261 CCACTCGGCG ACGGCGTCGC CCGGCCGGGA GCCATCACGG TAGAACGCGG GGCCGGTGTT 
1321 GCCCTTGTCG GTGGCGGCGT AGGCGTAACC GCGGGCGAGC ACCCAGTCGG CGATGGCCCG 
35 1381 GTCGTTGGCG TACTGCTCGC GGTTACCGGG GGTGCCGGCC ACGACCAGGC CACCGTTCCA 

14 41 GCGGTCGGGC AGCCGGATGA CGAACTGGGC GTCGTGGTTC CACCCGTGGT TGGTGTTGGT 
1501 GGTGGAGGTG TCGGGGAAGT AGCCGTCGAT CTGGATCCCG GGCACTCCGG TGGGAGTGGC 
1561 CAGGTTCTTG GGCGTCAGCC CTGCCCAGTC CGCCGGGTCG GTGTGGCCGG TGGCCGCCGT 
1621 TCCCGCCGTG GTCAGCTCGT CCAGGCAGTC GGCCTGCTGA CGTGCCGCCG CCGGGACACG 
40 1681 CAGCTGGGAC AGACGGGCGC AGTGACCGTC CGGGGCATCG GGAGCAGGCC GGGCCGTGGC 

17 41 CGGTGAGGGG AGCAGGACGG CGACTGCGGC CAGGGTGAGA GCGCCGAGGC CGGTGCGTCT 
1801 TCTCGGGGCC CGTCCGACAC CGAGGGGCAG AACCATGGAG AGCCTCCAGA CGTGCGGATG 
1861 GATGACGGAC TGGAGGCTAG GTCGCGCACG GTGGAGACGA ACATGGGTGC GCCCGCCATG 
1921 ACTGAGGCCC CTCAGAGGTG GGCCGCCGCC ATGACGGGCG CGGGACCGCG GGCGCTCCGG 
45 1981 GGCGGTGCCC GCGGCCGCCA CCGGTTCCGG GTCCCCGGGT CAGGGACAGG TGTCGTTCGC 

2041 GACGGTGAAG TAGCCGGTCG GCGACTCTTT CAAGGTGGTC GTGACGAAGG TGTTGTACAG 
2101 GCCCATGTTC TGGCCGGAGC CCTTGGCGTA GGTGTAACCG GCGCTCGTCG TGGCGCGGCC 
2161 CGCCTGGACG TGAGCGTAGT TGCCGGCGGT CCAGCAGACG GCCGTGGCAC CGGTCGTCTG 
2221 CGCGGTGACC GCGCCCGAGA GCGGTCCGGC CTTGCCGTCC GCGTCCCGGG CGGCGACCGC 
50 2281 GTAGGTGTGC GATGTGCCCG CCCTCAGGCC GGTGTCCGTG TACGACGTCG TGGCGGACGT 
2341 GGTGATCTGG GCACCGTCGC GGTGGACGGC GTAGTCGGTG GCGCCGTCGA CGGGTTTCCA 
24 01 GGTCAGGCTG ATGGTGGTGT CGGTGGCGCC GGTGGCGGCC AGGCCGGACG GAGCGGGCAG 
24 61 CGAACCGGGG TCGGAGGCGG ATCCGCTCAG GCCGAAGAAC TGCGTGATCC AGTAGCTGGA 
2521 ACAGATCGAG TCCAGGAAGT AGGCGGCGCC GGTGCTGCCG CACTGCTGTG CTCCGGTGCC 
55 2581 GGGATCGACC GGGGTGCCGT GCCCGATGCC CGGCACCCGG TTCACCTCCA CGGCCACCGA 
2641 TCCGTCCGCG GCCAGGTACT CCTCGTGCCG GGTGGAGTTC GGGCCGATCA CCGAGGTACG 
.2701 GTCCGGCGTC TGGGACACGC CGTGCACAGC GGTCCACTGG TCGCGCAACT CGTCGGCGTT 
27 61 GCGCGGCGCG ACGGTGGTGT CCTTGTCGCC GTGCCAGATG GCCACGCGCG GCCACGGGCC 
2821 CGACCACGAG GGGTAGCCGT CACGGACCCG CCGCGCCCAC TGGTCCGCGG TCAGGTCGGT 
60 2881 CCCGGGGTTC ATGCACAGGT ACGCGCTGCT GACGTCGGTG GCACAGCCGA AGGGCAGGCC 
2 941 GGCGACGACC GCGCCGGCCT GGAAGACGTC CGGATAGGTG GCGAGCATCA CCGACGTCAT 
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3001 GGCACCGCCG GCGGACAGCC CGGTGATGTA GGTGCGCTGG 
3061 GACGGTGTGA GCGGCCATCT GCCGGATCGA CGCGGCTTCG 
3121 GCTGCTCTGG AACCAGTTGA AGCACCTGTT CGCGTTGTTC 
3181 CACGAGCAGG AAGCCATAGC GGTCCGCGAA TGAGAGCAGG 
5 3241 CTGGGCGTCC TGGGTGCAAC CGTGCAGGGC GAACACCACC 

3301 CGCGGGCCGG TAGACGTACA TGTTCAGCCG GCCCGGGTTC 
3361 GGTCAGGTCC GCCTTGGTCA GACCGGGCTT GGCCAGGCCC 
34 21 CGCCGGGCCG AGCAGGGCCG CTCCGAGTAC GAGGGCCACG 
34 81 CACCCCCCGC CGTCCCGGAC GCGACAACGA CCCGACCGGC 

10 3541 CAGCGGGGTG AGGATTCCCC GGAACGGCGG CGGCTGCATG 
3601 GGGGGGACAC GGAGGGCTCC CTGACGTCGA TCAGTGGGAG 
3661 TAGGGGTGGT TCAACCCGCA ACGGTATGGC CCGGAGCACC 
3721 TGCGCCCGGA CGGATTGTGT CGCCTTGCGG AATCTGATAC 
37 81 ACCCGACACG GGTAGGGCGT CATGGTGTCC GACTCGGCCG 

15 3841 ACGGACCGGG CGTCGGCGGA CCGGGCGTCG GCGGGCTGGG 

3901 CCAGCCGCGT GGGGCGGCCG CGCCCAAGTG CAGTACGCCG 
3961 CGGACCGGTC AGTGCAGTCC CGCGGCCCTG CGGGACCGCT 
4 021 GCGGCGAACC GGGGTCCGTG TCCGCGGCGG TAGACCATCA 
4 081 ACGATGACAC CGTCCTGGTT GTAGCCGATG GTGCGCACGC 

20 4141 CGGCTGGCGG ACTCCCGGGT GTTCAGGACC TCGGACTGCG 
4 201 AAGACCGGGT TCGGCAGCCT GACCCGGTCC CAGCCGAGGT 
4 261 ATGTCGGTGA CGCTCTGCCC GGTGACCAGG GCGAGGGTGA 
4 321 TTGCCCCAGG TGGTGCCCGC CGAGTAGTGG CGGTCGAAGT 
4 381 GTCAGGAGCG TGAGCCAGGA GTTGTCGGTC TCCAGGACCG 

25 44 41 TACACGTCGC CGGTGGTGAA GTCCTCGAAG TAGCGGCCCT 

4 501 GTGCGGGTGG CGTCCTGGTC CGGGTTCTCA GTCGTCATGG 
4 561 CGGTCCGCTG TGAAATGCCG AACCTTCACC GGGCTCATAC 
4 621 ACCGTACGTA GTCGTAGAAC CTCGCCACCA CTGGCGCGCG 
4 681 CCACGCCGAC CGTGCGCCGC GCCTGCGGGT CGTCGAGCGG 

30 4 741 CGGGCCCGGA CGGGCTGCCG GTGAGGGGGG CGACGGCCAC 
4 801 GGGCCCGCAG CGTGCTCAGC TCGGTGCTCT CCAGGACGAC 
4 8 61 CGGCGCACAG CCGGTCGGTG ATCTGGCGCA GTCCGAAGAC 
4 921 CCTCATCGGC CAGCTCCGCG GTCCGCACCC GGCGGCGTCT 
4 981 GGACGAGCAG GCACAGTGCC TCGTCCCGCA GTGGTGTCCA 

35 5041 GTCGTGGGCT GGTCAGCCCC AGGTCCAGCC TGCTGTTGCG 

5101 CGGCGGCGTC GCCGCGCAGT TCGAAGGTGG TGCCGGGAGC 
5161 GGAGGTCGGG CACCAGCCAG GTGCCGTAGG AGTGCAGGAA 
5221 TGTCGGGGTC GATCAGGGCG GTGATGCGCT GCTCGGCGCC 
5281 GCAGGGCGTG GGCGCGGAAG ACCTCGCCGT £CTTGTTGAG 

40 5341 GGTCGAACAG CGGCACGCCC ACTCGTCGCT CCAGCCGCCG 
54 01 GCTGGGAGAT GTTGAGCCGT TCCGCGGTGA TCGTCACGTG 
54 61 TGAACCACTG CAACTCCCGT ATCTCCATGC AGGGACTATA 
5521 CGAGGTTTCG TCATTTCACA GCGGCCGGGC GGCGGCCCAC 
5581 GACCCCATGG GAGGGACCCC ATGTCCGAGC, ' CGCATCCTCG 

45 5641 PCGGGCCCCT GTCCGGTCTG CTCGTGGTTT CTTTGGAGCA 
5701 CCACCCGCCA CCTGGCGGAC CTGGGCGCCC GTGTCATCAA 
57 61 GCGACCTCGC CCGCGGCTAC GACCGCACGG TGCGTGGCAT 
5821 TGAAC CGGGG GAAGGAGAGC GTCCAGCTCG ATGTGCGCTC 
5881 TGCACGCCTT GGTGGACCGG GCCGATGTCC TGGTGCAGAA 

50 5941 GCCGCCTGGC ATCGGCCACC AGGTCCTCGC GCGGAGCCAC 
6001 CATATCCGGC TACGGCAGTA CCGGCTGCTA CCGCGGACCG 
6061 TCCAGTGCGA AGCGGGGCTG GTCTCCATCA CCGGCACCCC 
6121 GCCTGTCCAT CGCGGACATC TGTGCGGGGA TGTACGCGTA 
6181 TGCTGAAGCG GGCCCGCACC GGCCGGGGCT CGCAGTTGGA 

55 6241 TCGGTGAATG GATGGGATAC GCCGAGTACT ACACGCGCTA 
6301 GCGCCGGCGC CAGCCACGCG ACGATCGCCC CCTACGGCCC 
6361 AGACGATCAA TCTCGGGCTC CAGAACGAGC GGGAGTGGGC 
64 21 TACAACGCCC CGGTCTCTGC GACGACCCGC GCTTTTCCGG 
6481 ACCGCACCGA GCTCGACGCC CTGGTGAGCG AGGTGACGGG 

60 6541 TGGTGGCGCG GCTGGAGGAG GCGTCGATCG CCTACGCACG 
6601 TCAGCGAACA CCCCCAACTG CGTGACCGTG GACGCTGGGC 
6661 GTGCGCTGGA GGGCCTGATC CCCCCGGTCA CCTTCCACGG 
67 21 GCCGGGTCCC GGAGCTGGGC GAGCATACCG AGTCCGTCCT 
6781 ACAGCGCCGA CCGCGAAGAG GCCGGCCATG CCGAATGAAC 
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GGGTCCGCGC CGTAGGCGGA 
CCCTGGCCCC TGCGGTTGTC 
GACGAGGTGG TCTCGGCGAA 
CCGGAGTTGT CGGCGTAGCC 
GCCGGCTCCG CGGGCAGGGA 
GTGCCGAAGT CCGCGACCTC 
GCCGCGGCGT GGGCCGTCGG 
ACGGCCACGA GACGGG^GAG 
GGCGAGGAGG AGAGGGGGAA 
GCGGCTCCCT CGATGTCGTG 
CGCCCCGGTG CCCGGCACCG 
ACACCCCGCA CCGCGCGATG 
CCGGACGCGA CGAACGCCCC 
GTCGGCCTTG CCTGCCCTGG 
CGGTATGGCG GCCGAGGACG 
ACCGTGGCCG GCGGGAGGGC 
CGTCCCAGAC GGGTTCCACC 
GTGTCCGCTC GAAGGTGATG 
TGATGATGCC TACGTCAGGT 
AGTAGATGGT GTCGCCCTCG 
TGGCCATCAC ATGCTGGGAG ' 
AGGTGGAGTC CACCAGCGGC 
GCAGCGGCGC GGTGTTCTGC 
TGCGGCCCAG GGGGTGGCGG 
GCCAGCCCTC GACCACAGCG 
CGCTCATTCT GGGAAGTCCC 
GTGCGGCGCA TGAGCCCTGG 
TGGTCCTCCG GCGAGTGTGA 
CACGGCGACG GCGTGGTCAC 
ACCGAGGCCG GCGGCGACCA 
CCGCGGCAGG AATCCGGCCG 
CGGCTCCAGT GCCACGAACG 
GGCCAGCCGG TGTCCGGGTG 
CTCCACATCG TCCCCGGCGG 
GACGTCGTCG ACCACGGCGT 
CAGCCGGCGG TACCCGGCGA 
ACCCAGTGCC ACGGTGCCGG 
GGAGACCTCA CTGATCGCGC . 
CCGGAGCCGG TTCTGGTGCC 
GATGGCCCTG GACAGGGTCG 
CTCGTGCTCG GCCAAGGCCG 
CGTACCGGGC ATGGTCCTGG 
AGTGAGTCCT CACCAACCAG 
CCCTGAACAG GAACGCCCCG 
<3GCCGTCGCC GCTCCGTTCG 
GATCGAACGC CCCGGC*\GCG 
GTCCAGCCAC TTCGTCTGGC 
GCCGGAGGGC AACCGGCACC 
TCTGGCACCC GGCGCCGCGG 
CGAGGCTGAT CACCTGCGGA 
CAAGGCGTAC GACCTCCTGG 
CGAGACCCCG TCCAAGGTGG 
CTCCGGCATC CTCACGGCCC 
GGTCTCGATG CTCGAAGCCC 
CGGCGGCACC GCTCCGGCCC 
GTTCACCACG CGCGACGGGC 
TTCCTTCTGC GGTGTCGTGC 
CAACGCCGAC CGGGTGGCGC 
CACGCTCACC GGCGAGGAAC 
CCAGCGGACC GTGCGGGAGT 
TCCGTTCGAC AGCCCGGTCG 
CGAGCACCCG C<3GCGGCTGG 
GGCGTGGCTG GCCGCGCCCC 
TCACCGGAGT CCTCATCCTG 
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6841 GCCGCCGTGT TCCTGCTCGC CGGCGTACGG GGGCTGAACA 
6901 GCCACCTTTC TGCTCGGGGT GGTCGCACTC GACCGAACGC 
6961 TTCCCCGCGA GCATGTTCCT GGTGCTGGTC GCCGTCACGT 
7 021 GTCAACGGCA CGGTGGACTG GCTGGTACGT GTCGCGGTGC 
5 7081 GGAGCCGTCC CCTGGGTGCT CTTCGGCCTG GCGGCACTGC 

. 7141 TCGCCCGCGG CGGTGGCGAT CGTGGCGCCG ATCAGCGTCG 
7201 ATCGATCCGC TGTACGCCGG ACTGATGGCG GTGAACGGGG 
72 61 CCCTCCGGGA TCCTGGGCGG CATCGTCCAC TCGGCGCTGG 
7321 AGCGGCGGGC TGCTCTTCGC AGGCACCTTC GCCTTCAACC 

10 7 381 TGGCTCGTCC TCGGGCGCAG GCGCCTCGAA CCACATGACC 

74 41 ACGGAAGGGG ACCCGGCTTC CCGCCCCGGC GCGGAACACG 
7501 GCCGCGCTGG TGCTGGGAAC CACGGTCCTC TCCCTGGACA 
7561 TTGGCGGCGT TGCTGGCGCT GCTCTTCCCG CGCACCTCCC 
7621 GCCTGGCCCG TGGTGCTGCT GGTATGCGGG ATCGTGACCT 

15 7 681 CTGGGCATCG TGGACTCCCT GGGGAAGATG ATCGCGGCGA 

77 41 GCCCTGGTGA TCTGCTACGT GGGCGGTGTC GTCTCGGCCT 
7 801 CTCGGTGCCC TGATGCCGCT GTCCGAGCCG TTCCTGAAGT 
78 61 GGCATGGTGA TGGCCCTGGC GGCCGCGGCG ACCGTGGTGG 

7 921 AATGGTGCTC TGGTGGTGGC CAACGCTCCC GAGCGGCTGC 
20 7 981 TTGCTGTGGT GGGGCGCCGG GGTGTGCGCA CTGGCTCCCG 

8041 GTGGTGGCGT GAGCGCAGCG GAGCGGGAAT CCCCTGGAGC 
8101 CTGACGTAGC GTCAAGTCCA CGTGCCGGGC GGGCAGTACG 
8161 TAATCAGATA ACCCTGTCCG ACACGCTGCT CGCTTACGTA 
8221 TGACGAGGTG CTGAGCCGGC TGCGCGCGCA GACGGCCGAG 

25 8281 GCCGGTGCAG GCCGAGGAGG GACAGTTCCT CGAGTTCCTG 

8341 TCAGGTGCTG GAGATCGGGA CGTACACCGG CTACAGCACG 
84 01 GGCGCCCGGG GGCCGTGTGG TGACGTGCGA TGTCATGCCG 
84 61 GCGGTACTGG GAGGAGGCCG GGGTTGCCGA CCGGATCGAC 
8521 GACCGTCCTC ACCGGGCTGC TCGACGAGGC GGGCGCGGGG 

30 8581 GTTCATCGAC GCCGACAAGG CCGGCTACCC CGCCTACTAC 

8 641 ACGCCGCGGC GGGCTGATCG TCGTCGACAA CACGCTGTTC 
8701 AGCGGTGCAG GACCCGGACA CGGTCGCGGT ACGCGAACTC 
87 61 GGACCGGGTG GACCTGGCGA TGCTGACGAC GGCCGACGGC 
8821 GTGACCGGGG CGATGTCGGC GGCGGTCAGC GTCAGCGTCG 

35 8881 GGCTCCAGAT GCAGGCGTTC GACGCCGGCG GCGGAAGCGC 

8941 GGGCAGTCGG AGTCCGCGAA GCCCGCGAAC CGGTAGGCGA 
9001 TCCGTACGCC GGAAGTCCGC CACCAGGTGC GCCCCCGCGC 
9061 CAGTTCAGGA TCGTCGCACC GGCACCGAAC GACACGACCC 
9121 TTCAGGTGCC ACGTCGACGG CTTCTTCTCC AGCAGGATGA 
40 9181 CCGAAGCGGT CGCCCATGGT GACGACGAGG ACCTCATGGG 

9241 GCAGGTCGGC GTCGGAGTAG TGCACGCCGG TCGCGTTCAT 
9301 GTTCCTCGAC GCGGCTGAGT TCCTCCTCCC CCGCGGGTGC 
9361 GCGAGCGCAG GAAGTCCTCG TCGGGACCGG AGTACGCCTC 
94 21 AACCCGCCTG GTACATCAGG CGGCGCCGAC GCGAGTCGAC 
45 94 81 ACTCCGGCAG CGACAGGAGC GTGGCCGCCT GCTCGGCCGG 

9541 GGTGGAACGC CACCTCGGCA CGCTCGGCGG GCTGGTCGTC 
9601 GTGCGAAGTT CAGCTCCGTG GCGATCTCGC GGACGGACTG 
9661 TGCGGGCCAG CACGAAGTAC TCCGCCACAC CGAGGCGTTC 
9721 CGTGGrCGTT CTTGCTCGCC ACCGCCTGGA. GGATGCCGCG 
50 97 81 CCTCGCGGAT CTCGTCGGTG AGGACCACCT CGTCGTCCTC 
9841 AGGTGTTGTC CAGGTCCCAG ACCAGACACT TGACAATGGT 
9901 GGGAGCGCCA GCGCGTGCTG GGCCAGCATC ACCCGGCACA 
9961 ATCTCCATGA GCTTGGCGTC GCGGTACGCC CGTTCGACGA 
1002V GCCGACGCGA GCACCTGTGC GGCGGTCGCG GCCCCGGCGG 
55 10081 TGCTTGGCCA GGATCGTCGC GGGCACCATC TCGGGCGAGC 
10141 GCGTACTCGC ACACGCGGGC CGCGATCTGC TCCGCGGTCC 
10201 GCGACGAGTT GGTGGTCGCC GAGCGGCCGG CCGAACTGCT 
10261 ACCGCGGCGG TGCGGCAGGC CCGCAGGATC CCGACGCAGC 
10321 CCGTAGGCGA GTGACGCCGC GACCAGCATC GGCAGTGACG 
60 10381 GCGCCGGCCG GCACACGCAC CTGGTCCAGG TGCAGATCGG 
10441 CCGGACGGCT TCGGGACGCG CTCGACGCGT ACGCCGGGGG 
10501 ACCGCACCGG AACCATCCTC CTGGAGACCG AAGACGACCA 
10561 GCAGTCGTCC AGACCTTGTG GCCGTCGACG ACAGCGGTGT 
10621 GTCCGCATCG CCGACAGATC GCTGCCCGCC TGCCGCTCAC 
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TGGGCCTGCT CGCGCTGGTC 
CGGACGAGGT GCTGGCGGGT 
TCCTCTTCGG GATCGCCCGC 
GGGCGGTGGG GGCCCGGGTG 
TCTGCGCGAC AGGCGCGGCC 
CGTTCGCCGT CAGGCACCGC 
CCGCAGCCGG CAGTTTCGCC 
AGAAGAACCA TCTGCCCGTC 
TGGCGGTCGC CGCGGToTCA 
TGGACGAGGA CACCGATCCC 
TGATGACGCT GACCGCGATG 
CCGGCTTCCT GGCCCTCACC 
AGCAGGCCAC CAAGGAGATC 
ACGTCGCCCT GCTCCAGGAG 
TCGGCACCCC GCTGCTGGCC 
TCGCCTCGAC CACCGGGATC 
CCGGTGCCAT CGGGACGACC 
ACGCGAGTCC CTTCTCCACC 
GGCCCGGCGT GTACCAGGGG 
CGGCCGCCTG GGCGGCCTTC 
CCGTTTCCCG TGCTGTGTCG 
CCTAGCATGT CGGGCATGGC 
CGGAAGGTGT CCCTGCGCGA 
CTGCCGGGCG GTGGCGTACT 
GTGCGGTTGA CCGGCGCGCG 
CTCTGCCTGG CCCGCGGATT 
AAGTGGCCCG AGGTGGGCGA 
GTCCGGATCG GCGACGwCCG 
CCGGAGTCGT TCGACATGGT 
GAGGCGGCGC TGCCGCTGGT 
TTCGGCCGGG TGGCCGACGA 
AACGCGGCAC TGCGCGACGA 
GTCACCCTGC TGCGGAAACG 
TCGGCGCGGG CCTCGCGGAG 
CCGCCACCTC GGACACGCAG 
TCTCCATCAT GCGGTTGCGG 
GGGCGCCCTG GTCCGTGAGC 
GGCAGGACGT GGCGAGCAGT 
TGCCGACGGC GCCGTGCGGG 
CGGGATCGGT GAGCACGCGC 
CTGGCTGGTC CGCAGCGTCA 
GATCGTCATG GAGAGGTCGA 
CCGGGCCTGG TCGCGCGCGA 
CGTGGACACC GGCGGGCTGA 
GTAGCACCGC ACCTCGGGCA 
GATGAACGCG ATCGTGGTCG 
CGACTTCGGC CCCCATCCGA 
CAGACGCTCC CACGCGAGGT 
GTCGTCGAGC GTGGTGATCA 
CAGCACGGTG CCCCGCCACA 
CATGGCTGTC CTCTCAAGCC 
TCTCGCTGCT GCCCTCGATG 
CGTGTCCCTC TCTCGCGCCT 
CGGCTCGTTC GGCGGCGACG 
CCTCGTCCCA GTGGTCGCTG 
ACAGGTCGGC GATGTGCCCG 
CCCGGGTCCG GGCGTGGGCC 
CCCAGGCGAC CGACTTGCGC 
CGCCGGAGCC GGCCAGGACC 
CGTGGCCGGC GGCGCGGCAG 
TGTCGGCGGG CACGACCACC 
GGTGGTCCGC GTAGGCGGCG 
CCCCGTCGAG CCGAACCCGC 
TGAAGCCGAC GGCCGCGAGT 
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10681 TTCCCGCTGG TCAGCTCCTT CAGGAAGGTC GCCCGCTGAC CGGCGTCGCC GAGCCGCTGC 
10741 ACGGTCCACG CGGCCATGCC CTGCGACGTC ATGACACTGC GCAGCGAACT GCAGAGGCTG 
10801 CCGACGTGTG CGGTGAACTC GCCGTTCTCC CGGCTGCCGA GTCCCAGACC GCCGTGCTCG 
108 61 GCCGCCACTT CCGCGCAGAG CAGGCCGTCG GCGCCGAGCC GGACGAGCAG GTCGCGCGGC 
5 10921 AGTTCGCCGG ACGTGTCCCA CTCGGCGGCC CGGTCACCGA CAAGGTCGGT CAGCAGCGCG 
.10981 TCACGCTCAG GCATCGACGG CCCGCAGCCG GTGGACGAGT GCGACCATGG ACTCGACGGT 
11041 ACGGAAGTTC GCGAGCTGGA GGTCCGGGCC GGCGATCGTG ACGTCGAACG TCTTCTCCAG 
11101 GTACACGACC AGTTCCATCG CGAACAGCGA CGTGAGGCCG CCCTCCGCGA ACAGGTCGCG 
11161 GTCCACGGGC CAGTCCGACC TGGTCTTCGT CTTGAGGAAC GCGACCAACG CGTGCGCGAC 

10 11221 GGGGTCGTCC TTGACGGGTG CGGTCATGAG AACACCTTCT CGTATTCGTA GAAGCCCCGG 
11281 CCGGTCTTCC GGCCGTGGTG TCCCTCGCGG ACCTTGCCCA GCAGCAGGTC ACAGGGGCGG 
11341 CTGCGCTCGT CGCCGGTGCG TTTGTGCAGC ACCCACAGCG CGTCGACGAG GTTGTCGATG 
114 01 CCGATCAGGT CCGCGGTGCG CAGCGGCCCG GTCGGATGGC CGAGGCACCC CGTCATGAGC 
114 61 GCGTCGACGT CCTCGACGGA CGCGGTGCCC TCCTGCACGA TCCGCGCCGC GTCGTTGATC 

15 11521 ATCGGGTGGA GCAGCCGGCT CGTGACGAAG CCGGGCGCGT CCCGGACGAC GATCGGCTTG 
11581 CGCCGCAGCG CCGCGAGCAG GTCCCCGGCG GCGGCCATGG CCTTCTCACC GGTCCGGGGT 
11641 CCGCGGATCA CCTCGACCGT CGGGATCAGG TACGACGGGT TCATGAAGTG CGTGCCGAGC 
11701 AGGTCCTCGG GCCGGGCCAC GGAGTCGGCC AGTTCGTCAA CCGGGATCGA CGAOGTGTTC 
117 61 GTGATGACCG GGATACCGGG CGCCGCTGCC GAGACCGTGG CGAGTACCTC CGCCTTGACC 

20 11821 TCGGCGTCCT CGACGACGGC CTCGATCACC GCGGTGGCCG TACCGATCGC GGGCAGCGCG 
11881 GACGTGGCCG TCCGCAGCAC ACCGGGGTCG GCCTCGGCGG GCCCGGCCAC GAGTTGTGCC 
11941 GTCCGCAGTT CGGTGGCGAT CCGCGCCCGC GCCGCCGTAA GGATCTCCTC GGACGTGTCG 
12001 ACGAGTGTCA CCGGGACGCC GTGGCGCAGC GCGAGCGTGG TGATGCCGGT GCCCATCACT 
12061 CCCGCGCCGA GCACGATCAG CTGGTGGTCC ACGCTGTTTC CTCCCTCCGG GGTCACCATG 

25 12121 GCAGCGAGTA CGGGTCGAGG ACGTCTTCCG GGGTCGACCC GATCGCGTCC TTGCGGCCGA 
12181 GGCCGAGTTC GTCGGCGAAG CCGAGCAGCA CGTCGAACGC GATGTGGTCG GCGAACGCGC 
12241 TGCCCGTCGA GTCGAGGACG CTCAGGCTGT CCCGGTGGTC CGCCGCGGTG TCCGGTGCCG 
12301 CGCACAGGGC CGCCAGCGAC GGGCCGAGCT CGCGGTCCGG CAGTTGCTGG TACTCGCCCT 
12361 CGGCGCGGGC CTGCCCCGGA TGGTCGACGC AGATGAACGC GTCGTCGAGC AGGGTC^TCG 

30 12421 GCAGTTCGGT CTTGCCCGGC TCGTCGGCGC CGATGGCGTT CACATGCAGG TGCGGCAGCC 
124 81 GCGGCTCGGC GGGCAGCACC GGCCCTTTGC CCGAGGGCAC CGAGGTGACG GTGGACAGGA 
12541 CATCCGCGGC GGCGGCGGCC TCCGCCGGAT CGGTCACCTT GACCGGCAGT CCGAGGAACG 
12601 CGATGCGGTC CGCGAACGAC GCCGCGTGGC CGGGGTCGGT GTCGCTGACC AGGATCCGCT 
12661 CGATGGGCAG GACCCTGCTG AGCGCGTGCG CCTGGGTCAC CGCCTGTGCG CCCGCGCCGA 

35 12721 TCAGCGTGAG CGTGGCGCTG TCGGACCGGG CCAGCAGCCG GCTCGCGACG GCGGCGACCG 
12781 CGCCGGTGCG CATCGCGGTG ATCACGCCTG CGTCGGCGAG GGCGGTCAGA CTGCCGCTGT 
12841 CGTCGTCGAG GCGCGACATC GTGCCGACGA TCGTCGGCAG CCGGAAGCGC GGATAGTTGT 
12901 GCGGACTGTA CGAAACCGTC TTCATGGTCA CGCCGACACC CGGGACCCGG TACGGCATGA 
12961 ACTCGATGAC GCCGGGAATG TCGCCGCCGC GGACGAATCC GGTACGCGGC GGCGCCTCGG 

40 13021 CGAACTCGCC GCGGCCGAGC GCGGCGAACC CGTCGTGCAG CTCGCTGATC AGCCGGTCCA 
13081 TCATCACGTC GCGGCCGATC ACGGAGAGAA TCCGCTTGAT GTCACGTTGG CGCAGGACCC 
13141 TGGTCTGCAT GTGTCACCTC CCTTTCGTGG CCGGAGCTCT CTTGGTGGTG CCGCTCGGGG 
13201 CGGCTTCCGT TCTCATCGCA GCTCCCTGTC GATGAGGTCG AAAATCTCGT CCGCGGTCGC 
13261 GTCCGCGGAC AGCACGCCGG CCGGCGTGGT CGGGCGGGTC . TCCCGCCGCC AGCGGTTGAG 

45 13321 CAGGGCGTCC AGCCGGGTTC CGATCGCGTC CGCCTGGCGG <3CGCCCGGGT CGACACCGGC 
13381 AACGAGTGCT TCCAGCCGGT CGAGCTGCGC GAGCACCACG GTCACCGGGT CGTCCGGGGA 
13441 CAGCAGTTCA CCGATGCGGT CGGCGAGTG1C GCGCGGCGAC <9GG TAGTCG A AGACGAGCGT 
1.3501 GGCGGACAGT CGCAGACCGG TCGCCTCGTT GAGGCCGTTG CGCAGCTGCA CCGCGA^GAG 
13561 CGAGTCCACA CCGAGTTCCC GGAACGCCGC GTCCTCCGGG ATGTCCTCCG GGTCGGCGTG 

50 13621 GCCCAGGACG GCCGCTGCCT TCTGCCGGAC GAGGGCGAGC AGGTCGGTGG GGCGTTCCTG 
13681 CTCGTTGCGG GCGCTCCGGC GGGCCGACGG CTTGGGCCGG CCACGCAGCA GCGGGAGGTC 
13741 CGGCGGCAGG TCGCCCGCCA CGGCGACGAC ACTGCCCGTT CCGGTGTGGA CGGCGGCGTC 
13801 GTACATGCGC ATGCCCTGTT CGGCGGTGAG CGCGCTCGCC CCACCCTTGC GCATACGGCG 
13861 CCGGTCGGCG TCGGTCAGGT CCGCGGTCAG GCCACTCGCC TGGTCCCACA GCCCCCACGC 

55 13921 GATCGACAGC CCTGGCAGCC CTTGTGCACG CCGGTGTTCG GC<3AGCGCGT CGAGGAACGC 
" 13981 GTTCGCCGCC GCGTAGTTGC CCTGACCGGG GGTGCCCAGC ACACCGGCCG CCGACGAGTA 
14 041 GACGACGAAT GCGGCGAGGT CGGTGTCGCG GGTGAGCCGG TGCAGGTGCC AGGCGGCGTC 
14101 GGCCTTGGGT TTGAGGACGG TGTCGATGCG GTCGGGGGTG AGGTTGTCGA GCAGGGCGTC 
14161 GTCGAGGGTT CCGGCGGTGT GGAAGACGGC GGTGAGGGGT TGAGGGATGT GGGCGAGGGT 

60 14221 GGTGGCGAGT TGGTGGGGGT CGCCGACGTC GCAGGGGAGG TGGGTGCCGG GGGTGGTGTC 
14281 GGGGGGTGGG GTGCGGGAGA "GGAGGTAGGT GTGGGGGTGG TTCAGGTGGC GGGCGAGGAT 
14 341 GCCGGCGAGG GTGCCGGAGC CGCCGGTGAT GACGACGGCC CCCTCGGGGT CCAGCGGCCG 
14 401 CGGGACCGTG -AGGACGATCT TGCCGGTGTG CTCGCCGCGG CTCATGGTCG CCAGCGCCTC 
144 61 GCGGACCTGC CGCATGTCGT -GCACCGTCAC CGGCAGCGGG TGCAGCACAC CGCGCGCGAA 
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14 521 CAGGCCGAGC AGCTCCGCGA 
14 581 GAACGGTCGC TGGACGGCGT 
14 641 CGGCGCGAGC AGGCCGACGG 
14701 GTCGACCGGC GGGAACGCGT 
5 147 61 GTCCAGGTCC ACCAGATGGC 
14 821 GTGCCGCGCG ATCTGCCGGG 
14 881 CTTCTCGGCG GGGCGCAGCC 
14 941 GGTCATCACG GACGCCGCCT 
15001 GTGGTCGGCG ATGACCGTGG 

10 15061 CGGTGCCAGA CCGGAGACGT 
15121 GAGCACGCCC TGACCGGGGT 
15181 CGCCGCACGC ACACCGATCC 
15241 GTCGGCCGCG GTGAGGCCGT 
15301 GCTGTCCGGC ACGGTGAGCG 

15 15361 GCCGCGCAGC CGCAGACGCG 
15421 GAGCGTGACG CCGGACTCGG 
15481 GGCGCGCAGC AGTCCGGCCG 
15541 ATCCCCGCCG GAGCCGGTCA 
1560jl CACCGGGTCG TCGCCATCAG 

20 15661 ATCCGTGGGT GCGGCGACCT 
15721 GGACAGCGGG CGGGTGCGGA 
15781 GACGCGCAGA CTCAGCTCGT 
15841 CGTGGCGACG AACCGGGCCC 
15901 CGTGGTGAGG GCGACGGCGT 

25 15961 GTCCGCCTCG GCGGCCTGCT 
16021 ACGCCAGGCA GCCCGCAACC 
16081 TTCGTCATAG AACCCCGAGA 
16141 CGGCTCCACA CCGACAACAC 
16201 GTGCCGGGTC CAGCTGCCCG 

30 16261 GGCCTCATCA GCCCCTTCCA 
16321 AAGGGGGGAT TCGATGACCA 
16381 GATGACCAGC TCCACAAACG 
16441 AGCCAGCCAG GGGTGAGTGC 
16501 GGCGGGCAGC GCTGTGACAG 

35 16561 CG AC AG AT CG GTGGCACCGG 
16621 GGGCAGATCC AGCAGCCGTC 
.16681 GCCCAGGGTC CACGCCTGCG 
1674i CCGCAACGAC GCCACCGTGT 
16801 GCACTCCACG AACACCGACC 

40 168 61 ACGCAGATTC CGGTACCAGT 
16921 GGTCGACCAC CACGCCACCG 
16981 TTCATCCTCG ATGGCTTCCA 
17041 CACCCGCACG CCTTCGGCCT 
17101 CGCCACCACC GTCGAAGCCG 

45 17161 GACCTCACCG GCCGGCAACG 
17221 GATGACCTGA CTGCGCAATG 
17281 CACGCACGCC GCCGCGATCT 
17 341 ATGCGCCTGC CACAGCGCGG 
17 4 01 CTCCACCCGC TCCGCCACAT 

50 17 4 61 CGGCAGCAAC GCCTGAGCGC 
17 521 GAGTTCCACG CCCATGCCGA 
17 581 CGGCTGGTCC ACCGCCACAC 
17641 GAAGACAGCA CGCTCCCGCA 
17701 GCGCAGATAC CCCTCCAGCC 

55 17761 CACCGGCAAC GGCACCAACC 
17821 CTCAAGGATC ACGTGCGCGT 
17881 TGCCCGATCC GACTCGGGCC 
17941 CCAGTCCACA TGCGACGACG 
18001 CATCGCCATG ACCATCTTGA 

60 18061 GTTCGACTTC AACGAACCCA 
18121 AATGGCCTGC GCCTCGATGG 
18181 GTCCACATCG GCGGCGCGCA 
18241 GGACGGGCCG TTGGGGGCGG 
18301 GCGGACGACC GCGAGAACGG 
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TGATCTCCTT GAGCCGGTCG GGCCCCGCGT CCATCAGGTC 
GCCGGATGTG CGTCTTGGCC ATCTGGATGA ACGGGGCAGC 
ACGCGTCGAG GAGTTCACCG GTGAGCGAGT TGAGCACGAC 
CGGCGAACGC GGTGCTGCGG GAATCGGCCA GATGCGCTCC 
GCTTCGCGGC GCTGGTGGTC GCGTACACCT CCGCGCCCAG 
CGGCGGAACC GACACCGCCG GTGGCCGCGT GGATCAGGAC 
CGGCGAGGTC GACCAGGCCG TACCACGCGG TCGCGAACGC 
GCGGGAACGT CCAGCCGTCC GGCATCCGGC CGAGCATCCG 
GGCCGAAGCC GGTGCCGACG AGGCCGAAGA CGCGGTCGCC 
CGGCGCCGGT CTCCAGGACG ATGCCCGCGG CCTCGCCGCC 
AGGTGCCGAG CGCGATCAGC ACATCGCGGA AGTTGAGGCC 
GGACCTCGGC CGGGGCGAGG GGGCGCCGGG GCTCCGCCGA 
CGAGGGTGCC CGTCCGCGCC GGCCGGATCA GCCACGTGTC 
GCTCCGGCAC CCGGGTGAGG CGGGCCGCCT CGAACCGGCC 
GCTCGCCGAG TGCGACGGCG ATGCGCTGCT GCTCGGGGGC 
TCTCGACGTG GACGAACCGG CCGGGCTGCT CGGCCTGGGC 
CGGCGCCGGT GGCGAGGCCC GCGGTGGTGT GCACGAGCAG 
GGGCGGTCAG CAGCCGGGTG GTGAGCGCAC GCGTCTCGGC 
CGGCAGGCAA CGTGATGACG TCCACGTCGG TCGCGGGGAC 
CGATCCAGGT GAGACGCATC AGGCCGGTGC CGACGGGTGG 
CCGTCCGGAT CTCGGCGACG AGTTGGCCGG CGGAGTCGGC 
CGCCGTCACG AGTGATCACG GCTCGGAGCA TGGCCGAGCC 
CCTTCCAGGC GAACGGCAGA CCCGCAGCGC TGTCGTCCGG 
GCAGGGCCGC GTCGAGCAGC GCCGGATGCA CACCGAAACC 
CGTCGGGCAG CGCCACCTCG GCATACACGG TGTCACCATC 
CCTGGAACGC CGACCCGTAC TCATAACCGG CATCCCGCAG 
CGTCGACGGC CACGGCCGTG ACCGGCGGCC ACTGCGAGAA 
CGGGGGTGTC GGGGGTGTCG GGGGTCAGGG TGCCGCTGGC 
TGCCCTCGGT ACGCGCGTGG ACGGTCACCG GCCGCCGTCC 
CGGTCACCGA CACATCCACC GCTGCGGTCA CCGGCACCAC 
GCTCGTCCAC TATCCCGCAA CCGGTCTCGT CACCGGCCCG 
CCGTACCCGG CAGCAGGACC GTGCCCCGCA CCGCGTGATC 
GCAATGAGAT CCGGCCAGTG AGAACAACAC CACCATCGTC 
CGGCCAGCAT CGGATGCGCC GCACCCGTCA ACCCCGCCGC 
CCGCCTCCAG CCAGTACCGC CTGTGCTCGA ACGCGTACGT 
CCGGCACCGG TTCGACCACC GTGTCCCAGT CCACTGCCGT 
CCAACGCCGT CAGCCACCGC TCCCAGCCGC CGTCACCGGT 
GAGCCTGCTC CATCGCCGGC AGCAGCACCG GATGGGCACT 
CATCCAGCTC CGCCACCGCC GCGTCCAACG CCACCGGACG 
ACCCCTCATC CACCGGCTCC GTCACCCAGG CGCTGTCCAC 
ACGCGGCCTT CCCTGCCACC CCCTCCAGTA CCTTGGCCAG 
CGTGGGGCGT GTGGGAGGCG TAGTCGACCG CGATACGACG 
CATACCGCGC CACCACCTCC TCCACCGCCG ACGGGTCCCC 
GGCCGTTACG CGCCGCGATC CACACACCCT CGACCAGACC 
CCACCGAAGC CATCGCTCCC CGCCCGGCCA GTCGCGCCGC 
CCACCACGCG GGCGGCGTCC TCGAGGCTGA GGGCTCCGGC 
CGCCCTGGGA GTGTCCGATC ACCGCGTCCG GCACGACCCC 
CCAGGCtCAC CGCGACCGCC CAGCTGGCCG GCTGGACCAC 
CCGGCCGCGC CAACATCTCC CGCACATCCC AGCCCGTGTG 
ACTCCTCCAT ACGCGCGGCG AACACCGCGG AGTGGGCCAT 
CCCACTGGGC GCCCTGGCCG GGGAAGACGA ACACCGTACG 
CCGTCACCCG GGCATCGCCC AGCAGCACCG CACGGTGACC 
CCAACCCCTG CGCGACCGCG GCCACATCCA CACCACCCCC 
GCTCCACCTG CCCCCGCAGA CTCACCTCAC CACGAGCCGA 
CGTCAACAAC CGACTCCCCA CGCGACGGCC CAGGAACACC 
TCGTACCGCT CACCCCGAAC GACGACACAC CCGCATGCGG 
ACGGCCTCGC CTCGGTGAGC AGCTCCACCG CACCGGCCGA 
GCTCGTCCAC ATGCAGCGTC TTCGGCGCGA TCCCGTACCG 
TCACACCGGC GACACCCGCC GCCGCCTGCG CATGACCGAT 
GCAGCAGCGG AACCTCACGC TCCTGCCCGT ACGTCGCCAG 
GATCGCCCAG CGTCGTCCCC GTCCCGTGCG CCTCCACCAC 
GTCCGGCGTT CACCAACGCC TGCTGGATGA CACGCTGCTG 
ACAGCCCGTT GGAGGCACCG TCCTGGTTCA CCGCCGACCC 
TGTGTCCGTT . GCGCTCGGCG TCGGAGAGCC GCTCCAGCAC 
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18361 AAGAACGCCG GCGCCCTCCG 
184 21 GCGGCCGTCG GGGGAGAGTC 
184 81 CATGACGGTG ACACCGCCGA 
18541 GGCCTGGTGC AGCGCGACCA 
5 18 601 CTGGAGCCCA TAGAAGTACG 
18 661 GCCGAACCCG TCCAGGTCCG 
18721 GCCGGTGTCG CTGCCGCGCA 
18781 TGTCGTTTCC AGCAGGATCC 
18841 GCCGAAGAAC GCGGCATCGA 

10 18 901 CGATCCGCCG GTGAGGCCGG 
18 961 GTCGCCGCCA CTGTCCACCA 
19021 TCGGCAGGCC ATGCCCACGA 
19081 AGCGACCGGT GCGGCACCAC 
19141 CGTCGGGTAG TCGAAGACAA 

15 19201 GTTCCGCAGT TCGACGGCGG 
192 61 GGACACGTCC GCGGCGTCCG 
19321 CAGCAGCGCG GTGTCCCGCT 
19381 GGCGGTGGCC GCCGCCGGGC 
19441 GTGCGCGGTG AGGTCCATCG 

20 19501 TTCCAGCAGG CGCATGCCCA 
19561 GGTGCGGTTG GTGCCGCTCA 
19621 GGCCAGCGAC AGCGCGGGCA 
19681 CCCGTTCGCC GCCGAGTAGT 
19741 GTAGAGGACG AACGAGCGCA 

25 19801 GTCGGCTTTG GGGCGCAGTG 
19861 GTCGTCGAGC ACGGCTGCCG 
19921 CGCGGCGGCG AGCTGGTCCC 
19981 CGCCGGCGGT TCGCTGCGCG 
20041 ATGCCGGGCG AGGAGACCTG 

30 20101 CGGGTCGAGC AGCGGTTCGG 
20161 GTACCGGCCG TCGGTGACGC 
20221 CTCGATGGGG GTGTCGGTGC 
20281 GGCGG ACCGG ACGAGGCCGG 
20341 GAGGGTGGTC TCCGCAGGGC 

35 20401 CTCGGTGAGC CGGTACGTCT 
204 61 GATGTGGACC GCGTCCGCAG 
20521 GTACAAGGAG TTCCGTACGA 
20581 CGCGGCGACG GTCACCACCG 
20641 CGGGCCCTGA GTGATCGTGA 

40 20701 GCTCCACGAG AACGGCAGCC 
20761 GACGTGCAAG GCCGCGTCGA 
20821 CTGTTCCCCG GCGATCTCCA 
2088.1 CAGTCCCTGG AACGCTGGGC 
2094 i GCTCACGTCG ACGCGTCGCG 

45 21001 GCTTCCGGCC CGGCCGAGGG 
21061 ACGCGCGTGG ACGGTCACTC 
21121 CACATCCACC GCGCCGGTCA 
21181 CACCCCGCAA CCGGTGTCGT 
21241 CAGCAGAACC GTGCCCCGCA 

50 21301 CCGGCCAGTG AGAACAACAC 
21361 CATCGGATGC GCCGCCCCGG 
21421 CAGCCAGTAC CGCCTGTGCT 
21481 CGGTTCGACC ACCGTGTCCC 
21541 CGTCAGCCAC CGCTCCCAGC 

55 21601 TTCCATCGCC GGCAGCAGCA 
21661 CTCCGCCACC GCCGCGTCCA 
21721 ATCCACCGGC TCGGTCACCC 
21781 CCCGCCGGAA ATCCCCTCCA 
21841 CGTGTGGGAG GCGTAGTCGA 

60 21901 CGTCACCACT TCTTCCACCG 
21961 ACGCGCCGCG ATCCACACGC 
22021 AGGCATCGCC CCCCGCCCGG 
22081 GCGGGCGGCG TCCTCAAGGC 
22141 GGAGTGTCCG ACCACCGCGT 
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CCCAGCCGGT GCCGTTGGCG GCGTCCGCGA ACGCGCGGCA 
CGCCCTGCTG CTGGAATTCC ACGAACCCGG TCGGGGTCGC 
CCAGCGCCAG CGAGCACTCC CCGTGGCGCA GTGCGTGCCC 
GCGACGACGA GCACGCCGTG TCCACCGTGA ACGCCGGTCC 
AGATCCGGCC GGTGAGCACG CTGGGCTGCA TGCCGATCGA 
CGCCGACGCC GTACCCGTAC GAGAAGGCGC CCATGAACAC 
GTGTGCCCGG CACGATGCCC GCGCTCTCGA ACGCCTCCCA 
GCTGCTGGGG GTCCATGGCC CGTGCCTCAC GGGGGCTGAT 
AGCCGGCGGC GTCGGAGAGG AAGCCGCCGC GGTCCGTGTC 
ACGGGTCCCA GCCACGGTCG GCCGGGAAGC CGGTGACCGC 
TGCGCCACAG GTCGTCGGGC GAGGTGACGC CGCCCGGCAG 
TGGCCAGCGG TTCGTCACGG GTCGCGGCGG CTGTGGGAAC 
CGACCAGAGC CTCGTCCAAC CGCGACGCGA TGGCCCGCGG 
GCGTGGCGGG CAGTCGGACA CCGGTCGCCG CGGCGAGTCG 
TCAGCGAGTC GATACCCAGT TCCTTGAAGG CCGCGTCCGC 
CGTGGCCGAG CACCGCCGCC GCGTTGTCGC GGACCAGTGC 
CAGCGCCGGA CATGGTGCCG AGCCGGTCGG CGAGCGGAAC 
GCGATACGGC GCGGCGCAGA TCGGCGAAAA GCGGCGATGT 
TGGCCGCCAC GGCGAACGCG GTGCCGGTTC CGGCCGCGGC 
CACCGGCCGA CATGGGGCGG AAACCGCCGC GGCGGACACG 
TGCTGCCGGT GAGTCCGCTG TCATCGGCCC AGAGGCCCCA 
GTCCTTCGGC ATGGCGCAGC GTCGCGAGTC CGTCGAGGAA 
TGCCCTGGCC GCGGCCGCCC ATGATGCCCG CGACGGACGA 
GGTCCGCGTC CCGGGTCAGC TCGTGCAGGT GCCAGGCGCC 
TGGTGGCGAG CCGCTCCGGG GTGAGTGCCG TGGTCACGCC 
TGTGGAAGAC CGCCGTGAGC GGCCTGCCGG CGGCGGCGAG 
GGTCGGCGAC GTCACAGCGG ATGTGGACAC CGGGAGTGTC 
ACAGCAACAG GAGGTGGCGG GCGCCATGCT CGGCGACGAG 
CCAGCACACC CGAGCCGCCG GTGATGACCA CCGTGCCGTC 
GCGTTTCCGC GGCGGCCGTG CGGGTGAACC GCGGCGCTTC 
GGACGTACGG CTCGGCCAGT GTCGTGGCGG CGGCCAGCC-C 
CGGTCTCCAC CAGCACGAAC CGGCCCGGGT GCTCGGCCTG 
CGACCGCTCC TCCGACCGGT CCCGCGTCGA TCCGGACGAC 
CGTCCTCGGC GATCACCCGG TGCAGCTCGC CGAGCACGAA 
CGTCGAGGAC ATCCGCGCCC GGTTCCGGGA GCGCGGAGAC 
GACCGGGCCC GGGAGTGGGC AGCTCGGTCC AGGAGAGGCC 
CGGCGGCGTC GCCGTCGACG TTCACCGGTC GCGCGGTCAG 
GTTGGCCGAC CGGGTCCGTC GCATGCACGG CAGCGCCGTC 
CGCGCAGCGT GGTGGCCCCG GTCGTGTGGA ACCGCACGCC 
GCACCTCCGC TTCCTGTTCC GCGAGC AG CG GCAGGCAGGT 
ACAGCGCCGG GTGGACGCCA TAGTGCGGCG TGTCGTCCGC 
CCTCGGCGTA CAGGGTTTCG CCGTCGCGCC AGGCGGTGCG 
CGTAGCTGTA GCCGGTCTCG GCCAGCCGCT CGTAGAACGC 
CGCCCGGCGG CGGCCACGCG GGCGGCGGGA CCGCCGCGAC 
TGCCGCTGGC GTGCCGGGTC CAGCTGTCCG TGCCCTCGGT 
GCCGCCGTCC GGCCTCATCG GCCCCTTCGA CGGTCACCGA 
GCGGCACCAC GAGCGGGGTC TCGATGACCA GTTCATCCAC 
CACCGGCCCG GATGACCAGC TCCACAAACG CCGTACCCGG 
CCGCGTGATC AGCCAGCCAG GGATGCGTAC GCAACGAGAT 
CACCACCGTC GTCGGCGGGC AGTGCTGTGA CGGCGGCGAG 
TCAGCCCGGC CGCGGACAGA TCGGTGGCAC CGGCCGCCTC 
CGAACGCGTA GGTGGGCAGA TCGAGCAGCC GTCCCGGCAC 
AGTCCACTGC CGTGCCCAGG GTCCACGCCT GCGCCAACGC 
CGCCGTCACC -GGTCCGCAAC X3ACGCCACCG TGTGAGCCTG 
CCGGATGGGC GCTGCACTCC ACGAACACGG ACCCGTCCAG 
GCGCGACGGG GCGACGCAGG TTCCGGTACC AGTAGCCCTC 
AGGCGCTGTC CACCGTGGAC CACCAGGCCA CCGACCCGGT 
GTACCTCGGC CAACTC<3TCC TCGATGGCTT CCACGTGGGG 
CCGCGATACG GCGCACTCGC ACGCCTTCGG CCTCGTACCG 
CGGACGGGTC CCCCGCCACC ACAGTCGAAG ACGGGCCGTT 
CCTCGACCAG GTCCACCTCA CCGGCCGGCA ACGCCACCGA 
CCAGCCGCCC GGCGATCACC TGGCTGCGCA AGGCCACCAC 
TGAGGGCTCC GGCCACACAC GCCGCCGCGA TCTCGCCCTG 
CCGGCACGAC CCCATGCGCC TGCCACAGCG CGGCCAGGCT 
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22201 CACCGCGACC GCCCAGCTGG 
222 61 CGCCAACATC TCCCGCACAT 
22321 CATACGAGCC GCGAACACCG 
22381 AGCACCCTGC CCGGGAAAGA 
5 224 41 CCGGGCATCG CCCAACAACA 
22501 CTGCGCGACC GCGGCCACAT 
225 61 CTGCCCCCGC AGACTCACCT 
22621 AGCCGACTCC CCACGCGACG 
22681 GCTCACCCCG AAAGCGGAGA 

10 22741 CGCCTCGGTG AGCAGTTCCA 
22801 CACATGCAGC GTCTTCGGCG 
228 61 GGCGACACCC GCAGCCGCCT 
22921 CGGAACCTCA CGCTCCTGCC 
22 981 CAGCGTCGTC CCCGTCCCGT 

15 .23041 CTTGTGGAGG GCCTGGCGGA 
23101 GTTGGAGGCG CCGTCCTGGT 
23161 GTTGCGCTCG GCGTCGGAGA 
23221 GGTGCCGTCC GCCGCGTCAG 
23281 CCGGGAGAAC TCCACGAAGG 

20 23341 CAGCGAGCAC TCCCCGGTCC 
234 01 CGAACACGCC GTGTCGACCG 
234 61 TCCGGCGAGC ACCGCGGGCT 
23521 GCCGTAGCCG TAGTAGAAGC 
23581 CGGCACGATG CCGGCGTGTT 

25 23641 CGGGTCGAGT GCGGTGGCCT 
23701 GGCGCCCGCG AGTGCGCCGG 
237 61 CACGTCCCAG CCGCGGTCGG 
23821 CTGCCACAGC TCTTCCGGTG 
23881 GGCGAGCGGC TCGTTCGCCG 

30 23941 GTCCTTGACC GACGTCCGCA 
24 001 TCAGCACGTG CGCGATGAGC 
24 061 CCGCGGTCGT GGTGCTCGCG 
24121 TGTCGTCCGG GGTCCCGTTG 
24181 CGCCGGCGGC GGGATAGTCG 

35 24241 AGAGCCGGTT GCGCAGGCCG 
24 301 TGGTGGCCGT GACCGCCGCC 
24361 CGACGCCGAG CAGCACCTGT 
24 421 GGGAGCCGCC GTCGGTCGCG 
24 4 81 ACGGGTCGCC GGGCCCGGGT 

40 24 541 CGGCGTCGAG GAGGTCGGTC 
24 601 CTTGTGCCCG GCGCAGGTCG 
24 661 CGGCGAGAAC GAACGCGGTC 
24721 ACTCGGCGGT GCCGTCCGCG 
24781 GCTCGTACCG GATCACTTCG 

45 24 841 CGCCCGCGAG GAGGACGGTG 
24 901 CGAGGCGGGG CGCTTCGAGG 
24 961 AGAGGGCGGC GGCGCGGCGG 
25021 CCGGTTCCGC GGTGTCGAGC 
25081 ACACCACCAG CGTGGCGCCG 

50 25141 GACCGGATAC CGGGACGACG 
25201 GGCGGGCCGT GGTGCCGGGT 
25261 GCCGCACGTC CCCGTCCGGG 
25321 GAGCCACCGG CCGTCCCAGT 
25381 CGTGGACGAA GGTGACGCGC 

55 25441 ACGCGAACGG CAACCGTACC 
25501 GCGCGGTGAC GAGCAGCGCC 
25561 CGTCGAGGGC GACTTCGGCG 
25621 GGAACTCGGG GCCGAACTCG 
25681 CGACCGGTTC CGCGTGCTCG 

60 25741 CGATGCCGGC GAAGCCGGAG 
25801 GGACGCGCAC GGCACGGCGT 
25861 CGGCGCCGGT GGCGGGCAGG 
25921 AGCCTGCCTC GTCGGCGCCG 
25981 CGGCGCCGTC GACGGAGTGA 
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CCGGCTGGAC CACCTCCACC CGCTCCGCCA CATCCGGCCG 
CCCAGCCCGT GTGCGGCAAC AACGCCCGCG CACACTCCTC 
CAGAACACGC CATCAACTCC AC ACCG AT G C CCACCCACTG 
CGAACACCGT ACGCGGCTGA TCCACCGCCA CACCCATCAC 
CCGCACGGTG ACCGAAGACA GCACGCTCAC GCACCAACCC 
CCACACCACC CCCGCGCAGA TACCCCTCCA GCCGCTCCAC 
CACTCCGAGC CGACACCGGC AACGGCACCA ACCCATCGAC 
GCCCGGGAAC ACCCTCAAGG ATCACGTGCG CGTTCGTACC 
CACCGGCCCG GCGCGGACGT CCCGCGTCGG GCCACGCCCG 
CCGCGCCCTC GGTCCAGTCC ACATGCGACG ACGGCTCGTC 
CGATGCCATA CCGCATCGCC ATGACCATCT TGATGACACC 
GCGCATGACC GATGTTCGAC TTCAACGAAC CCAGCAGCAG 
CGTACGTCGC CAGAATCGCG TGCGCCTCGA TGGGATCGCC 
GCGCCTCCAC CACGTCCACG TCGGCGGGGG CGAGCCCCGC 
TGACGCGCTG CTGGGAGGGG CCGTTGGGTG CGGAGATGCC 
TGACGGCGGA GGAGCGGACG ACCGCGAGGA CGGTGTGTCC 
GCTTTTCGAC GACGAGGACG CCGGCCCCCT CGGCGAAACC 
CGAACGCCTT GCACCGTCCG TCCGGCGCGA CGCCGCCCTG 
TCTGTGGTGA TGCCATCACT GTGACACCAC CGACCAGCGC 
GCAGCGCCTG CCCGGCCTGG TGCAGCGCGA CCAGCGACGA 
TGACCGCCGG ACCCTCCATG CCGAAGAAGT ACGACAGCCG 
GTGTGCTGTA GGCGCCGAAT CCGCCCAGGT CCGCGCCCGT 
CGCCGACGAA GACGCCGGTG TCGCTGCCGC GCAGGGTGTC 
CGAGCGCCTC CCAGGCGATT T C GAGGAGG A TCCGCTGCTG 
CGCGCGGACT GATGCCGAAG AACGCGGCAT CGAAGTCGGC 
CCCGCCCGGT GGCGGACTCG GCGGCGGCGT GCAGCGCGGC 
TGGGGAAGTC GCCGATCGCG TCGCGGCCGT CCGCGACGAG 
AGGTGACGCC GCCCGGCAGT CGGCAGGCCA TGCCGACGAC 
CGGCGCGCAG CGCGGTGTTC TCCCGGCGGA GCTGCGCGTT 
GCGCCTCGAT CAGGTCGTTC TCGGCCATCG CCTCATCCCT 
GCGTCTGCGT CCATGTCGTC GAACAGTTCG TCGTCCGGCT 
GGTGCCTGTG CCGGTGGTTC ACCGCCGTCC GGGGTCCCGT . 
ACGTCCGGGG CCAGGAGGGT CAGCAGATGA CGGGTGnGCG 
AAGACGAGCG TGGCCGGCAG CGGAATGCCG AGGGCCTCGG 
AGCGCGGTGA GCGAGTCGAC CCCGAGGTCC TTGAACGCCG 
GCGTCGGTGT GGCCCAGCAG GGTGGCGGCG GTGTCGCGGA 
TCCCGTTCCT TGTGGGGCAG GTCCGGCAGG CGTTCCAGCA 
GAGCGCCGGG TGGGGCGCTG GATCGGTCGC CACAGCGGTG 
GGGGCGGTCG CCACGACCAC GGCTTCCCCG GTGGCGCACG 
AGCCGGTCCG CCGCGGCGGT GAACGCCACG GCCGGCAGGC 
GCCAGGGCCT GGAGCGGTCC GGCCGCCTCG CCGGACGGAA 
AGGTCGAGGT CGCGGGTCAG GCGGTGCAGT TCCCAGGCCG 
TGGACGACCG CGGTCACCGG GGTTTCCGGC ACTGTGCCCG 
GCGCCGTGTC CGCCGAGGTG TCCGGCGAGT TCCTCCGAAC 
TCGCCGTACG AGGCCGCGGC CGTGGTGGGC GCGGCGGGGA 
CGCCCGTCGG CCAGGCGCAG GTGCGGTTCG TCGAGGCGGG 
GGGGTGACCG TGTCGGTGGT CTCCACGAGC ACGAGCCGGC 
AGTGCGGCGA CGGCACCGGC GACGGGCCCG GCCTCGGCGG 
GCGGTCCTCG GGTCGTCCAG TGCGGTACGG ACCTCGTCGG 
ATGACGTCGG GCGTGGCGTC GTCGCCGAGG TCGGTGTACC 
GCCGCCGGGG CCCGGACGCC GGTCCAGGTG CGCCGGAACA 
CCCGTCGTGG CGGGGGGCCG GGTGATGAGC GAGCCGATCT 
TCGTCGGCGA GGTGCACGCG GGCGCCGCCC TCGCCCTCGC 
AGTTTCGTGG CGCCGCTGGT GTGGACACGG ACGCCGGTGA 
CCCGCGTTCT CGGCGGCCGC GCCGATGCTG CCCGCTTGCA 
GGGTGCAGTG TGTAGCGGGC GGCGTCCCTG GCGAGGGCGC 
CAGACGGTGT CTCCGTGGCT CCACGCGGCG GACATGCCGC 
TATCCCGCGT CGTCGAGTCG CTGGTAGAAG GCCGCGACGT 
GGCGGCCAGG GCCCCGGCGT GGTGGCCGGT TCGGTGGTGG 
GCGTGGCGGG TCCATGTCCG GTCGCCGTCC GTCCGGGCGT 
CCGGTGTCGT CGGGCGCGGC GACGGTCACG CGCACCTGGA 
ACCAGCGGTG TCTCGACGAC CAGTTCGTCG AGCAGGTCGC 
CGTCCGGCCA ATTCCAGGAA GGCGGGTCCG GGCAGCAGTA 
CCGGCCAGCC ATGGGTGGGT GGCCAGCGAG AACCGGCCGG 
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2 6041 TGAGCAGCAC CTCGTCGGAG 
26101 CGGCGTCGAG TCCGAGGCCG 
2 6161 CATGGTGGAA GGCGTATGTG 
26221 CCCAGTCGAC GGGCACGCCG 
5 2 6281 CTCCCCCGCC GCGGCGGAGC 
.26341 GGTGCGCGCT GACCTCGACG 
2 6401 TGGCGAAGCC TACGGGGTGG 
2 6461 CG ATCCAGCG TTCGTCGGCG 
2 6521 CCGCGACGAT CCGCTGGAGT 

10 2 6581 AGTCGACGGC GATGCGGCGC 
2 6641 CGACGGCGTC CGGGCGCCCG 
26701 AGACGCCGTC GATCCGGGCG 
2 6761 CCATCGCGCC GCGTCCGGCG 
26821 GGCGGGCACC GTCCTCCAGG 

15 26881 GGGAGTGTCC GATGACGGCG 
26941 ACACCATGAC GGCCCAGCAG 
27001 CGTCGAGCAT GGCGATGGGG 
27061 GCATCCTGGC GGCGAACACC 
27121 GCGGTCCTTG TCCGGGGAAG 

20 27181 CGACGTCGTC GTCGAGCAGC 
272 41 CCGCGGCGAT GGCGCGCGGG 
27301 GGACCTGGCC GTCGAGGGCC 
27361 TGGCGATCAG CGGCTCACCG 
27421 CCGGGTGGGC TTCCAGCAGG 

25 27481 CGGCGCGCCG CGGGCGGTCG 
27541 CGCCGGCCGT CCAGTCGACG 
27 601 TGCCGTGCCG CATGGCGAGG 
27661 TGTGGCCGAT GTTGGACTTC 
277 21 AGGTGGCCAG CACCGCCTGT 

30 277 81 CCTCCACGGC GTCCACGTCC 
27841 CCCGCTCCTG CGAGGGCCCG 
27 901 CCGCCGAACC CCGGACAACC 

27 961 TCTCGACGAT CAGCACACCG 
28021 ACGCCTTGCA GCGCGCGTCG 

35 28081 ACGGCGAGGC CATCACCGTG 
28141 GTGACTGCCC GGCCTGGTGC 
28201 CCGCCGGACC CTCCAGACCG 
28261 TGCCGGTCGC GCCGAAACCG 
28321 CCATGAACAC GCCGGTGTCG 

40 28381 GCGCCTCCCA CGAGGTCTCC 
284 41 GCGGACTGAT CCCGAAGAAC 

28 501 GACGCACGGT CGACGTGCCC 
28 561 AACCACGGTC CGTCGGAAAC 
28 621 AGTCCTCCGG CGACGCGACC 

45 28681 GCTCGTCCTG CCGGACGGCC 
28741 GCGCCGCGGT GAGCTTCGCC 
28801 CGGGCAGCCG TACGCCCGTC 
288 61 AGTCGACGCC GAGTTCCTTG 
28 921 CGAGTACGGC CGCGGTGCAC 

50 28981 CGGAGAGCCG CGCGATCCGG 
29041 CCCGGCGCGG TGCGCCCAGC 
2 9101 GCGCCGGGTC CGAGGACCGC 
29161 GCGCCGTCAC GCCGTCGCGG 
29221 GTTCCCACAG GCCCCAGGCC 

55 29281 CCAGCGCGTC GAGGAACGCG 
29341 CACCGGCGGC CGACGAGTAG 
294 01 GCAGGTGCCA CGCGGCGTCC 
294 61 GCGCGGTGAG GACGCCGTCG 
29521 GCGCCGGGTC GATCCCCGCC 

60 29581 CGATCGCCGT GACCTCGGCG 
29641 GCAGCCGGCG CACGCCGTGG 
29701 CGGAGCCACC GGTGACGAGC 
29761 GGACCGCCGG GGCCAGACGG 
29821 CATCGAGCGC GGTGGCCGCT 
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TCGGGGAGCG CCACCGACGC GGCGAGCAGC GGGTGGTCGA 
GAAGCGTCCG TGCCGGCCGC GGTCTCGATC CAGTAGCGCT 
GGCAGGTCGT GTGCCGTCGC CGTCGCGGGG ACGACCGCCG 
GTTGTGTGCG CCTCGGCCAG CGCGGTGAGC AGCCGGTGGA 
GTGGCGACGG TCGCGCCGTC GATCGCGGGC AGCAGCACGG 
AACACGGTGT CACCCGGCTC GCGGGCAGCG GTCACGGCCG 
CGCATGTTGC GGAACCAGTA CTCGTCGTCG AGCGGCGCGT 
GTGGAGAACC ACGGGATCTC GGGCGTGCGC GAGGTGGTGT 
TCGTCGTACA GCGGGTCGAC GAACGGGGTG TGGGTCGGGC 
ACCCAGACGC CGCGGGCCTC GTAGTCGGCG ATCAGCGTTT 
GCGACGGTCG TGGTGGTGGC GCCGTTGCGG CCCGCGACCC 
GCATCCGCCT CGACGTCGGC GGCCGGGAGC GCGACCGAGC 
AGTTCGCGCA GGAGCAGGAG AACGCTGCGC AGCGCGACGA 
GTGAGCGCTC CGGCGACACA GGCCGCGGCG ATCTCGJCCT 
TCCGGGCGTA CGCCCGCGGC CTCCCACACG GCGGCCAGCG 
ACGGGGTGCA CGACGTCGAC GCGGCGGGTC ACCTCCGGGT 
TCCCAGCCCG TGTGCGGGAT CAGCGCGTCG GCGCATTGGC 
GGGGAGGCCG CCATCAGTTC GACGCCCATG CCGCGCCACT 
ACGAAGACGG TGCGCGGCTC GGTGAGCGCC GTGCCGGTGA 
ACGGCGCGGT GCGGGAACGT CGTACGCCTG GCGAGCAGGC 
TCGTGGCCGG GACGGGCGGC GAGGTGCTCG CGGAGTCGGC 
GTGGCGGTCC GCGCCGAGAC GGGCAGTGGT GTGAGCGGCG 
GGCTTCGAGG CCGACGGCTC CTCGGCCGGC GGCTCCCCGG 
ACGTGGGCGT TGGTGCCGCT GACGCCGAAG GAGGACACAC 
GTCTCGGGCC AGGGCCGGGC ATCGGTGAGG AGTTCGACGG 
TGCGAGGACG GCGTGTCCAC GTGCAGGGTG CGCGGCAGGG 
ACCATCTTGA TGACACCGGC GACACCCGCG GCGGCCTGAG 
AGCGAGCCCA GCAGCACCGG GGTGTCGCGC CCCTGCCCGT 
GCCTCGATGG GATCGCCCAG CCTGGTGCCG GTGCCGTGCG 
GCCGGGGTGA GCCCGGCGTT GGCCAGGGCC TGCCGGATCA 
TTCGGCGCCG ACAACCCGTT GGAAGCACCG TCCTGGTTGA 
GCCAGCACAC GGTGGCCGTT GCGCTCGGCA TCGGAGAGCC 
GACCCCTCGG CGAAACCGGT GCCGTCAGCC GCATCCJCGA 
GGCGCGAGAC CCCGCTGCTG GGAGAACTCG ACGAAGCCGG 
ACGCCGCCGA CCAGGGCGAG CGAGCATTCG CCGGAGCGCA 
AGCGCCACCA GCGACGACGA ACACGC03TG TCGACCGTGA 
TAGAAGTACG ACAGCCGACC GGACAGCACA CTGGTCTGGG 
CCCAGGTCGG TGCCGAGTCC GTACCCGTCG GAGAAGGCGC 
CTTCCGCGCA GCGACTCCGG GAGGATCCCG GCGTGTTCCA 
AGGACCAGAC GCTGCTGCGG GTCCATCGCC AGCGCCTCAC 
GCCGCGTCGA AGTCCGCCAC CCCGGCGAGG AAGCCACCAT 
GGATGATCCG GATCGGGATC GTACAGCCCG TCCACGTCCC 
GCCGTGATCC CGTCACCACC CGACTCCAGC AGCCGCCACA 
CCACCCGGCA GCCGGCAGGC CATCCCCACG ATCGCCAACG 
GCGGTCGTGG TGCGGGTCGG CGATGCCGTC CGGCCGGACA 
GCGACGGCGC GCGGCGTCGG GAAGTCGAAG ACCGCGGTGG 
GCCTCGGTGA AGGCGTTGCG CAGCCGGATC GCCATGAGCG 
AACGTGGCGG TCGCCTCGAC CCGTGCGGCA CCGTCGTGGC 
TGCCGGACGA CGGCGAGCAC GTCCTTTTCG GCGTCCGCGG 
TCGGCGAGGG TGGTGGCGCC GGCCGCCCGG CGCCGCGGCT 
AGGGGCGAGC TGCCGAGGCC GGCCGGGTCG GCGGCGACCA 
AACGCCGCGT CGAACAGCGT CAGTCCGCCT TCGGCGoTCA 
CGCATGCGGG CGCCGGTGCC GACCGTCAGC CCGCTCTCCG 
ACGGACAACG CGGGCAGTCC GGCTGCCCGG CGCTGTTCGG 
TTCGCGGCCG CGTAGTTGCC CTGTCCGGGG CTGCCGAGCA 
AGGACGAACG CGGCCAGTTC CGTGTCCTGG GTGAGTTCGT 
ACCTTCGGGC GCAGCACCGT CTCGAGCCGG TCGGGGGTGA 
TCGAGGACGG CCGCGGTGTG CACGACGGCC GTGAGCGGGT 
AGTACGGAGG CGAGTTCGTC CCGGTCGGCG ACGTCGCAGG 
CCGGGCACGT CGCTCGCCGT GCCGCTGCGC GACAGCATCA 
CGTTCGACGA GGTGGCGGCT GATGATGCCG GCCAGCGTCC 
ACGGTGCCGT CCGGGTCGAG CGCCGGAGCG TCACCCGCCG 
CGGGCGTACA CCTGGCCGTC ACGCAGCACC ACCTGGGGCT 
GCGAGCAGCG GCTCGGCGGT GTCCGGGGCG GCGTCGACGA 
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2 9881 GGACGATCCG GCCGGGGTGT 
2 9941 ACGCGAGACC GGGCCCGGTG 
30001 GGAAGCGCTG CACGGCGGTC 
30061 CACCGCCGCC GCCGTGCGCG 
5 30121 GGCCGGTCGT CGCGGTCGTG 
.30181 GGCCCGGAAC GGCTCCCGTG 
30241 CGGGCCCGCC GGTCTCGTCC 
30301 CCGTGGCGCC GGTGGCGTGG 
30361 CCGCGGCGAG GCGGAGTGCG 

10 30421 CGGCGAGCTG TCCGTCGGCG 
304 81 CGGCGCGCGG GCGGGGCAGC 
30541 CGATGTCGTC GGGGTCCACC 
30601 GCACGGCCGG GGCCGTCCGC 
30661 CCCCCGCCGC GTGCCGCGTG 

15 30721 TCACCGTGAC GGAGAGCGCG 
30781 TGAACGTGTC GAGGGCGCCG 
30841 GGGCCGCGGC GGGCAGCACC 
30901 CGACCCGGCC GGTGAGCACC 
30961 CCGGGTGCGC GACCGGCGTC 

20 31021 GCCAGTAGCG GACCCGCTCG 
31081 GGTCGATGAC CTTCGGCCAG 
31141 TCAGGGCGGA TCGCGGTTCG 
31201 TCAGGCTCCG GTCCGGGCCG 
31261 CCCCGAACCG GACGGTGTCG 

25 31321 CGCCCGCGGC CATCGGGATC 
31381 ACTCCTCGAG CATCGGCTCC 
31441 TGAAGCGGCC GAGCCGGGCC 
31501 TCGACGGGGG CCCGTTGACC 
31561 CCCGTTCCGA CGCGATCACG 

30 31621 GGGCCCGTGC GGACACCAGC 
31681 CGGCGGCCAG CTCGCCGATC 
31741 CGAGCTGTGC GCCGAGTGCG 
31801 CGTGGAGGTC GAGCCCGGGG 
318 61 CGAAGACGTC GTAGGCGGCG 

35 31921 CGGAGAAGAG CCACACGAGG 
31981 CGATCAGCGC GGCCCGGTGC 
32041 GCTCGTCCTC CTCGCCGGTG 
32101 CCTGCGGGGT GCGTGCCGAG 
32161 GTTCGGGGGC CGGTCGGGGG 

40 32221 CGAAGGAGGA CACCCCGGCG 
32281 TGAGGAGTTC GACGGCGCCG 
32341 GGGTGCGCGG CAGGGTGCCG 
324 01 CCGCGGCGGC CTGAGTGTGG 
324 61 CGCGATGCTG CCCGTAGGTG 

45 32521 TCCCGGTGCC ATGCGCCTCG 
32581 GCGCCTGCCG GATCACCCGC 
32641 CACCGTCCTG GTTGACCGCC 
32701 CGGCGTCGGA GAGCCTCTCG 
32761 CAGCGGCATC CGCGAACGCC 

50 32821 AGTCCACGAA GCCGGACGGC 
32881 ACTCCCCCGA GCGCAGCGAC 
32941 CCGTGTCCAC CGTGACCGCC 
33001 GCACACTGGT CTGGGTGCTG 
33061 CGTAGAAGTA GCCGCCCATG 

55 33121 TCCCGGCGTG TTCCAGCGCC 
33181 TCGCCAGCGC CTCACGCGGA 
33241 CGAGGAAGCC ACCATGACGC 
33301 GCCCGTCCAC GTCCCAACCA 
33361 CCAGCAGCCG CCACAAGTCC 

60 33421 CCACGATCGC CAACGGCTCG 
33481 TGGCCCGCGC GCCGGCCAGT 
33541 CGAAGACGAG CGTAGCGGGC 
33601 CGACGCCGGT CAGCGAGTCG 
33661 GGGCGTCGCG GTGGCCGAGC 
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TCGGCCTGCG CGGTCCGCAC CAGTCCGGCG GCCGCGGCCG 
TGGACGGCCA GGACCGCGTC GGCGTACCGG TCGTCGGTGA 
AGGACGCCGG CGCCCAGTTC GCGGGTGTCG TCGAGCGGGG 
GGGAGGATCA CCACGTCCGG GACCGTCGGG TCGTCGAGGC 
GGCGGCAGCT CCGGGAGCTC GGCCAGCACC GGGCGCAGCA 
ATCGTCAGGG GGCGCCTGCG CACGGCGCCG ATGGTGGCGA 
GCGAGGTGTA CGCCGTCAGC GGTGACGGCG ACGCGTACCG 
ACGCGGACGT CGTCGAACGC GTACGGAAGG TGGTCCCCTT 
GCGCCGAGCA GCGCCGGGTG CAGGCCGTAC CGTCCGGCGT 
AGGGCCACTT CCGCCCAGAC GGCGTCGTCG TCGGCCCAGA 
GCGGGCCCGT CCGTGTACCC GGCTCGGGCC AGACGGTCGG 
GGCCGGGCCG TGGCGGGCGG CCACGTCGAC GGCATCTCCC 
GGGTCGGGGG CGAGGATTCC GTGCGCGTGC TCGGTCCACT 
TGCACGGTGA CCGCGCGGCG GCCGTCCGCC CCGGGCGCCC 
AGCGCACCGG ACCGCGGCAG CGTGAGGGGG GTGTCChCGG 
CAGCCGGCTT CGTCGCCCGC CCGGATCGCC AGATCCAGGA 
GCGAGGCCGT GCAGGGAGTG CGCCAGCGGA TCGGCGGCGT 
AGGTCGCCGG TGCCGGGCAG GGTGACCGCC GCGGTCAGCG 
TGTCCGGCCG GGGCCGCGTC GCCCGCGGTC TGGGTGCCGA 
AACGGGTACG TCGGCGGGTG CGAGGCGCGT GCCGGCGCGG 
TCGACCGTGA CGCCGTCGGT GTGCAGCCGG GCGAGCGCGG 
TCGTCGGCGT GCAGCATCGG GATGCCGTCG ACGAGTCGGG 
ATCTCCAGGA GCACCGCCCC GTCGTGCGCG GCGACCTGTT 
CGGACCTGTC GTACCCAGTA CTCCGGCGTG GTGCAGGCGG 
CTCGGCTCGT GGTACGTCAG GCTCTCCGCG ACCTTGCGGA 
ATCCGCGCCG AGTGGAACGC GTGGCTGGTC CGCAGGCGGG 
GCGACGTCGA GCACCGCCTC CTCGTCACCG GAGAGCACGA 
GCGGCGATCT CCACGCCGTC CCGCAGCAGC GGCAGCGCGT 
GCGGCCATCG CCCCGCCGGA CGGCAGCGCC TGCATCAGGC 
CTGCACGCGT CCTCCAGGGA CCAGACGCCG GCGACGTACG 
GAATGGCCCA CGAAGGCGTC CGGGCGTACG CCCCACGCCT 
ACCTGGAGCG CGAACACCGC GGGCTGGGCG TACCCGGTGT 
GGCACGTCGA GGGCGTCCAG CACCTCGCGG CGAGTGCGGG 
GCCAGTCCGT CGCCCATGCC GGGACGTTGT GAGCCCVGTC 
CGGCGGTCCG GTTCTGCGGC GCCGGTGACC GTGTCGGTGC 
GGGAAGGCCG TGCGGGCGAG CAGGGCCGCG GCCACCGCGC 
GCGAGGTGGG CGCGCAGGCG GTGTACCTGT GCGTCGAGTG 
AGCAGCAGGG GCAGCGGTCC GGTGTCGGGT GCCGGGGCGG 
TGGCTTTCGA GGATGATGTG AGCGTTGGTG CCGCTAACGC 
CGCCGTGGGC GGTCGGTTTC GGGCCAGGGG CGGGCGTCGG 
GCCGTCCAGT CGACGTGCGA GGACGGCGTG TCCACGTGCA 
TGCCGCATGG CGAGGACCAT CTTGATGACA CCGGCGACGC 
CCGATGTTGG ACTTCAGCGA GCCCAGCAGC ACCGGGGTGT 
GCCAGTACCG CCTGCGCCTC GATGGGGTCG CCCAGCCTGG 
ACAGCGTCCA CATCCGCCGG GGTGAGCCCG GCGTTGGCCA 
TCCTGCGACG GCCCGTTCGG CGCCGACAAC CCGTTGGAAG 
GAACCACGCA CGACCGCCAG GACATTGTGG CCGTGCCGCT 
ACGATCAGCA CACCGGATCC CTCGGCGAAA CCGGTGCCAT 
TTGCAGCGGC CGTCCGGGGA GAGGCCCCGC TGCTGGGAGA 
GAGGCCATCA CCGTGACGCC GCCGACCACG GCGAGCGAGC 
TGCCCGGCCT GGTGCAGCGC CACCAGCGAC GACGAACACG 
GGACCCTCCA AACCGTAGAA GTACGACAGC CGACCGGACA 
GTGGCACCGA AACCGCCGCG GTCGGCTCCA GTGCCGi'ACC 
AACACGCCGG TGTCGCTTCC GCGCAGCGAC TCCGGGAGGA 
TCCCACGAGG TCTCCAGGAC CAGACGCTGC TGCGGGTCCA 
CTGATCCCGA AGAACGCCGC GTCGAAGTCC GCCACCCCGG 
ACGGTCGACG TGCCCGGATG ATCCGGATCG GGATCGTACA 
CGGTCCGTCG GAAACGCCGT GATCCCGTCA CCACCCGACT 
TCCGGCGACG CGACCCCACC CGGCAGCCGG CAGGCCATCC 
TCCTGCCGGA CGGCCGCGGT CGGGGTACGC CGCCGGGTGG 
TCGTCCAGGT GGGCGGCGAG CGCCTGCGCC GTGGGGTGGT 
AGCGTCAGGC CCGTCGCGTC GGCCAGCCGG TTGCGCAGTT 
AAGCCCACTT CCCTGAACGC GCGCGCGGGT GCGATGGCGT 
ACCGCGGCAG CGCTGGTACG GACGAGGTCG AGCATGTCGC 
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33721 GCGCGGCCGG AGGTGCGGAC 
33781 GGACCCGGTC GGACGCGGCG 
338 41 GGTCGGTGTG CAGGGCCGCG 
33901 TGCCGTTGCG GGCGATGCGG 
5 33961 CCGCGTCCCA CAGTCCCCAG 
34021 GGGCGAGCGC GTCGAGGAAC 
34 081 ACGTGGCGGA TATGGACGAG 
34141 CGTGCAGGTG CCAGGCGACG 
34 201 GCATGGTCGT CACGGCCGCG 

10 34261 GCTGGGCGAC GTCGGCGACG 
34 321 CGTACCGCAC "GCGGTCGTCC 
34 381 CGACCTCGGC GGCCTCGTGC 
34 441 CGGTGCCGCC GGTGACGAGG 
34 501 CGACACGGCG CAGACGGGCC 

15 34 561 CGCCGGCGGC GAGCCCGGCC 
34 621 CGACGCGGCC GGGATGCTCC 
34 681 CGGGATCGCC GGTACGGGTG 
34 741 GCCAGGTCTG CACGGTGGTG 
34 801 AGGTGCCCGG GTCGCCGGGT 

20 34 8 61 GCACGTCGGC GAGGTACGTC 
34 921 TCTCGAACAG CGCCTCGGCA 
34 981 GGACCGGTGA GCCGTGCTCG 
35041 CGAGCAGCAC GCGCAGCGCG 
35101 ACGCCAGCCG GCGCCGCTCC 

25 35161 CGAGCAGCAC GGGGTGCAGC 
35221 ACGCGTAGGC GCGGCCCTCC 
35281 ACGAGAGCGG CAGCGCGTCG 
35341 GCCAGTCCAC GGGCTCCGCC 
354 01 GCGCCCAGGG GCCCGTGCCG 

30 354 61 CGGTTCCGAC GGTGGCCTGG 
35521 CGATGGTCAG CTCCGCGATC 
35581 CCACGAGCGC CGAGCCGGGC 
35641 GCTGACGGCG TACCGAGACA 
35701 CCCACGAGCC GAGCAGCGGG 

35 357 61 GGTCACGGCG GAACGGGTAC 
35821 TGACGGGCAC GCCCCGGACC 
35881 CCTCGCCTCG CCGCAGTGTG 
35941 CCAGTGCGGT GGTGAGCACG 
36001 CCGCCAGGTG GCCGGTCGCG 

40 36061 AGGCGGCGTC CGCGGGCCGG 
36121 CCGGCGTGCG CGGAGTGATG 
36181 CATGCGCGGT GTGCGACGCG 
36241 GCAGCTCCTC CACGGCGTCG 
36301 CGGCGACCTC CAGGCGCCCG 

45 36361 CCATGCCGCC CTGCCCGGCC 
36421 TCGCGGCGTC GTCCAGGGTG 
364 81 AGTGGCCGAC GACCGCGGCC 
3654 1 CCATCACCGC GAACGACGCG 
36601 GCCGCTGGGC GATGACGTCC 

50 36661 ACTCGCGGAG CCGCCGGGCG 
36721 CCCACTGGGA GCCCTGCCCG 
36781 TTCCCGTCAC GGCCCCCGGC 
36841 GCACGACCGC CCGGTGGCGC 
36901 CCGCGGCGCC AGTGAGCGGG 

55 36961 GGGCCGACAT CGGCCAGACC 
37021 GTGCGGGCGC GGCGGGGGGC 
37081 CGAACGACGA GACACCCGCA 
37141 GCAGCAGCCG GATGTCGCCG 
37201 TGCGCGGCAG GACGCCGTGC 

60 37261 CCGCGGCCTG GGTGTGGCCG 
37321 GTTCGCGCCC GTAGGCCACT 
37381 CGGTGCCGTG TGCCTCCACG 
374 41 CACGCTGG AT GACGCGCTGC 
37501 CGTCGGAGTT- GACCGCGGAG 
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GTGCGCCGGA CGGCCGGCAC GAGGGTGCGT AGGACCGGCG 
ACGGCGGCGA -GGTCGAGCCG GATCGGCACG AGCGCGGGCC 
TCGAACAGGG CGAGCCCCTG TGCGGCCGTC ATCGGGGTCA 
GCCAGGTCGG TGGCGGTCAG CCGCCCGCCC ATCCCGTCCG 
GCGAGCGAGA CGGCGGGCAG CCCCTGGTCG TGCCGGTGGC 
GCGTTGCCGG TCGCGTAGTT GGCCTGACCC GCGCCGCCGA 
TACAGGACGA ACGCGGCCAG GTCGAGATCG CGCGTCAGCT 
TCCGCCTTGA CCCGCAGCAC GGCGTCCCAC TGCTCCGGCC 
TCGTCGACGA TCCCGGCCAT GTGCACGACG GCGCGCAGCC 
ACTGCGGCCA GCTCGTCGCG GTCGACGACG TCGGCGGCCA 
TCCGGCGTGT CGCCGGGCCG GCCGTTGCGG GACACCACGA 
ACGGTGAGCA GGTGGTCCAC GAGGAGGCGG CCGAGCCCGC 
ACGGTCCCGC CGGTCAGCGG GGAGGTTCCG GTGGCCGCGG 
GCACGCGCTG TGCCGTCGGC GACCCGGACG TGCGGCTCGT 
GCTATGGCGG CGGGCGTGAT CTCGTCCGCT TCGATC AG GG 
GTCTCCGCCG TCCGGACCAG GCCGCCGAGC GCTTCCrGCG 
GCCACGATGA GCCGGGATCG CGCCCAGCGC GGCTCGGCGA 
AGCAGGTCGC GGCCCAGCTC CCGGGTCCGG GCGCCGGGCG 
TCCACGGCCA GGACCACGAC CGGGGGGTGC TCGCCGTCGG 
CAGTCGGGGA CGGGTGACGC GGGCACGGGC ACCCAGGCGA 
TCGGGGTCGG CGGCCCGCAC GGTCAGGCTG TCGACG TCAA 
TCCGTGGCGA CGATGCGGAC CATGTCGGGG CCGACGCGTT 
GTCGCGGCGC GCGCGTGGAT CCTCACGCCG GACCAGGAGA 
GGGTCCGTGA AGACCGTCCC GAGGGCGTGC AGGGCCGCGT 
CCGTACCGGG CGTCGGTGAG CTGTTCGGCG AGGCGGACCG 
CCCGTCCACA TCGCGGTCAT GGCCCGGAAC GCGGGCCCGT 
TAGAAGCCGG TCAGGTCGGC CGGGTCGGCG TCGGCGGGCG 
GGACCGCCAG TGTCCACGCT CAGCGCTCCG GTCGCACTGA 
GTACGGCTGT GCAGACTCAC CGACCGCCGT CCGGACACCT 
ATCTCCGTGT CGCCGTCGCC GTCGACCACC ACCGGCGCGA 
TCCGGCGTGC CGAGCCGGGC TCCCGCTTCG GCGAGCAGTT 
ACGATGACCC GGCCGTCCAC CTCGTGGTCG GCGAGCCAGG 
CCGCGGTGGC CAGCGCGCCC TCGCCGTCGG GCGAGGTCGA 
TGGCCGGACG TTCCCGCCGG TTCCGCGTCG ATCCAGTACC 
GTGGGCAGCG GCACCACCCG ACGCGTCGCG AACGACOAGG 
CAGAGCGCGG CGAGCGACCG AGTGAAGCGG TCCAGGCCGC 
CCGGTGACGA CCGTATGCGC ATGCCCGGCG AGCGTGTCCT 
GGATGCGCGC TGACCTCGAC GAACGCGCGG TATCCGCGGT 
GCGGCGAACC GAACGGTGCG GCGCAGGTTG TCGTACCAGT 
TCCAGCCACG CCTCGTCCAC GGTGGAGAAG AACGGGACGT 
CCGGCGAGAG CGTCGAGCAG CGCGCCGCGG ATCGTTTCGA 
TAGTCGACGG CGATCCGGCG GGCGCGGGGG GTGGCGGCCA 
GCCGCACCGG CGACAACGAT CGACGCGGGT CCGTTGACCG 
GCCCACACGG CGGCGTCGAA GTCGGCGGGC GGCACCGAGA 
AGTTCGGTGG CGACGAGTCG GCTGCGCACC GCGACGACCT 
AGCACCCCGG CGACGCAGGC CGCGGCGACT TCGCCCTGGG 
GGGGCGACCC CGTGCGCACG CCACAGCTCC GCCAGCGCCA 
GGCTGCACGA CATCGACCCG GTCGAACGCG GGCGCTCCGG 
AGCAGGTCCC ATCCGGTGTG CGGGGCGAGC GCCGTGGCGC 
AACACGGGCT CGGTGGCGAG CAGTTCGGCA CCCATGCCGG 
GGGAACGCGA ACACGACACG TGTGTCGGTG ACGTCGGCGG 
ACTTCGGCAC CACGGGCGAA CGCCTCCGCC TCTCGGGCCG 
ATGGCC<3TCC GGGTGGTGGC GAGCGAGTGG CCGACCGCGG 
GCCAGCTGTC CCGCGACGTC CCGCAGTCCC TCCGGGoTCC 
ACGTCCTCGG GCACCGGCTC GGCTTCGGGT GCGGACACGG 
CCGGCCTCCA GGACGACATG GGCGTTGGTG CCGCTGATGC 
CGCCGGGCGC GCCCGGTGAC CGGCCACGGC TCACTGCGGT 
TCCCAGTCGA CGTGCCGGGA CGGCTCGTCG ACGTGCAGCG 
CGCATCGCCA TGACCATCTT GATGACGCCG GCGACGCCGG 
ATGTTCGACT TGAGCGAGCC GATCAGCAGC GGATGCACGC 
TGCAGGGCCT GGGCCTCGAC GGGGTCGCCG AGACGGGTGC 
GCGTCGACGT CACCCGGCGC CAGGCCGGCG TCGGCGAGCG 
TGCGCAGGCC CGTTCGGGGC GGACAGCCCG TTCCACGCGC 
CCGCGCACCA GCGCCAGCAC GGGGTGGCCG TGGCGGGTGG 
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37561 CGTCGGAGAG CCGCTCCAGC 

37 621 CGGTGTCCGC GAAGGCCTTG 

37 681 CGACGAACCC GGTCGTCGTC 

377 41 CCCCCGAGCG CAGCGACCGC 

5 37801 TGTCGACGGT GACCGACGGG 

.37861 CGCTGGTCGG CGTGCCGGTC 

37 921 CGGTGAACGC GCCCATGAAT 

37 981 CCGCTCGTTC GAACGCCTCC 
38041 CCAGCGCCTC ACGCGGGCTG 

iO 38101 GGAAGCCGCC GTGACGCACG 

38161 CGGCGAGGTC CCAGCCGCGG 

38221 CCAGCCGCCA CAGGTCCTCC 

38281 CGATCGCCAG CGGCTCGTTC 

38341 CAGGGGCCGG CTCACCCCGC 

15 384 01 GGTGGTCGAA GACGGCCGTC 

384 61 GCAACCGGAC ACCGCTGAGC 

38521 TCTCGGAGGC GTCGGCGTGG 

38581 GGTCACGATC GCGGTCGCGG 

.38 641 GCTCGGTCCG CTGCCGGACG 

20 38701 CGGCGAGGCT CGCGTCGATG 

.387 61 CGCGCACCCG CTGCCGGTCG 

38821 ACATGCCCCA GGCGATGGAG 

.388 81 CGTCGAGGAA GGCGTTGGCG 

38 941 CGGCGCTGGA GTAGAGGACG 
25 39001 GCCAGGCGGC GTTGGCTTTG 

39061 CGAGGATGCC GTCGTCGAGG 

39121 TGTGGGCGAG GGTGGTGGCG 

39181 CGGGGGTGGT GTCGGGGGGT 

39241 GGCGGGCGAG GATGCCGGCG 

30 39301 GGTTGAGGGG GGTGGTGGTG 

39361 GGAGGGTGTG GTGGGTGAGG 

39421 GGAGGGGAGT GTGGGGGTGG 

39481 GGGCGGTGCG GGTGAGGCCG 

39541 TGAGGGTGTG GTCGGTGGTG 

35 39601 GGGTGTGGGC GCGGGTGGGT 

39661 CGTGTCCCTC GGGCAGGTCA 

39721 GGAGCGGGTT CGGCCCCGAC 

39781 ACACGACAGG ACGGCCATCC 

39841 TGAGGGCGAC GCGCACCGCG 

40 39901 ACGGCAGCTC GATCCCGCCG 

39961 GTGCCGGATG C AC ACC G AAA 

4 0021 CGGCATACAC GGTGTCACCA 

40081 ACTCATAACC GGCATCCCGC 

40141 TGGCCGGCGG CCACTGCGAG 

45 4 0201 GGGTCAGGGT GCCGCTGGCG 

4 0261 CGGTCACCGG CCGCCGTCCG 

40321 CTGCGGTCAC GGGCACCACG 

.40381 CGGTCTCGTC ACCGGCCCGG 

40441 TGCCCCGCAC CGCGTGATCA 

50 4 0501 GAACAACACC ACCACCGTCG 

4 0561 CCGCCCCGGT CAGCCCGGCC 

40621 GCCTGTGCTC GAACGCGTAG 

40681 CCGTGCCCCA GTCCACCCCC 

4 0741 GCTCCCAGCC ACCGTCACCA 

55 40801 GCAGCAGCAC CGGATGGGCA 

40861 CCGCATCCAG CGCGACAGGG 

.40921 CGGTCACCCA GGCGCTGTCC 

40981 TTCCCTTCAG TACCTCAGCG 

41041 CGTAGTCGAC CGCGATACGA 

60 ,41101 CCTCCACCGC^ CGACGGGTCC 

41161 TCCACACACC CTCGACCAGA 

41221 CCCGGCCGGC CAGCCGCGCC 

41281 CCTCCAGGCT GAGGGCTCCG 

41341 CCACAGCGTC CGGCACGACC 
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ACCAGGACAC CGGCGCCCTC GGCGAAGCTC GTGCCGTCCG 
GCACGGCCGT CGGGGGCGAG CCCGCGCTGC CGGGAGAACT 
GCCATCACCG TGACACCGCC GACCAGGGCG AGCGAGCACT 
GCGGCCTGGT GCAGCGCCAC CAGCGACGAC GAACACGCCG 
CCCTCCAGAC CGAAGTAGTA CGAGAGCCGC CCGGAGAGAA 
GCCCCGAAAC CGCCCAGGTC CACGCCCGCG CCGTAGCCCT 
ACGCCGGTGT CGCTGCCGCG GACGCTTTCG GGCAGGATGC 
CACGACGCTT CGAGGACCAG ACGCTGCTGC GGGTCCATCG 
ATCCCGAAGA ACGCGGCGTC GAAGTCGGCG GCGCCGGTGA 
GAAACCTTGC CGACCGCGTC GGGGTTCGGG TCGTAGAGCG 
TCGGCGGGGA ACTCGGTGAT CGCGTCCCCG CCGGAGTCGA 
GGTGACCGCA CGCCACCGGG CATCCGGCAC GCCATGGCCA 
CCCGCCACCG TCGGTGCGGG CACTGTCGCC GCCGGAGCGG 
CGTTCCTCAT CCAGGCGGGC GGCGAGCGCG GCCGGTGTCG 
GCGGAGAGCC GTACCCCCGT CGTCTCGGCG AGGCTGTTGC 
GAGTCGATGC CGAGGTCCTT GAACGCCGTC GTGGGCGTGA 
CCGAGCACGG CGGCCGTGGC CGCACACACG ATGGCCAGCA 
TCGCGGTCGC GGTTGTCCTC CGCACGGGCG GCGATGCGGG 
GGCTCGGTGG GAATCGCCGC GACCATGAAC GGCACGTCCG 
AAGTGGGTGC CCTCGGCCTC GGTGAGCGGC CGGAACCCGT 
GCGTCGTCAA GTTGTCCGGT GAGGGTGCTG GTGGTGTGCC 
GTGGCGGGTT GGCCGAGGGT GTGGCGGTGG GTGGCGAGGG 
GCGGCGTAGT TTCCTTGTCC GGGGCTGCCG AGGACGGCGG 
AAGTGGGTGA GGGGTTGGTT TTGGGTGAGG TGGTGCAGGT 
GGGTGGAGGA CGGTGGTGAG GCGGTCGGGG GTGAGGGCGT 
GTGGCGGCGG TGTGGAAGAC GGCGGTGAGG GGTTGGGGGA 
AGTTGGTGGG GGTCGCCGAC GTCGCAGGGG AGGTGGGTGC 
GGGGTGCGGG AGAGGAGGTA GGTGTGGGGG TGGTTCAGGT 
AGGGTGCCGG AGCCGCCGGT GATGATGATG GCGTGTTCGG 
GGTGGGGTGG TGGTGTGGAG GGGGGTGAGG TGGGGTCGGT 
CGGAGGTGGG GGTGGTCGAG GGTGGCGAGT TGGGCCAGGG 
TCGGTTTCGA TGAGGCGGAT GCGGTGGGGG TGTTCGTTCT 
GTGACGGTGG CGCCGGCGGG GTCGGTGGTG GTGTGGACGA 
GTGAGGTGGT GTTGCAGGGC GGTCAGGACG CGGGTGGCGC 
ATGTCCTCGG GGTCGTCGGG GTGGGCGGCG GTGATCAGGA 
CCGTCGTAGA CCGCCTCGGC GACCGCGAGC CACTCCAACC 
GGGGTGTCGG CCCGCTCCCT CAGCACCAGC GAGTCCACCG 
GGGTCGGCCA CGCGCACGGC GACGCCGGCC TCCCCCCGGG 
GCGGCCCCGG TGGCGTTCAG GCGCACGCCC GTCCAGGAGA 
CCCGCGTCGA GGCGCCCGGC GTGCAGGGCC GCGTCGAGCA 
CCGTCCGCCT CGGCGGCCTG CTCGTCGGGC AGCGCCACCT 
TCACGCCAGG CAGCCCGCAA CCCCTGGAAC GCCGACCCGT 
AGTTCGTCAT AGAACCCCGA GACGTCGACG GCCGCGGCCG 
AACGGCTCAC CGGAAGCGTT GGAGGTATCC GGGGTGTCGG 
TGCCGGGTCC AGCTGCCCGT GCCCTCGGTA CGCGCGTGGA 
GCCTCATCGG CCCCTTCCAC GGTCACCGAC ACATCCACCG 
AGCGGGGATT CGATGACCAG TTCATCCACC ACCCCGCAAC 
ATGACCAGCT CCACAAACGC CGTACCCGGC AG C AG AACCG 
GCCAGCCAGG GATGCGTACG CAATGAGATC CGGCCGGTGA 
TCGGCGGGCA GTGCTGTGAC GGCGGCCAGC ATCGGATGCG 
GCGGACAGGT CGGTGGCACC GGCCGCCTCC AGCCAGTACC 
GTGGGCAGAT CCAGCAGCCG CCCCGGCACC GGTTCGACCA 
GCACCCAGAG TCCACGCCTG CGCCAACGCC CCCAGCCACC 
GTCCGCAACG ACGCCACCGT GCGGGCCTGT TCCATCGCCG 
CTGCACTCCA CGAACACCGA CCCGTCCAGC TCCGCCACCG 
CGACGCAGGT TCCGGTACCA GTACCCCTCA TCCACCGGCT 
ACGGTCGACC ACCACGCCAC CGACCCGGTC CCGCCGGAAA 
AGTTCGTCCT CGATGGCCTC CACGTGAGGC GTGTGGGAGG 
CGCACCCGCA CCCCATCAGC CTCATACCGC GCCACCACCT 
CCCGCCACCA CCGTCGAAGC CGGACCATTA CGCGCCGCGA 
CCCACCTCAC CGGCCGGCAA CGCCACCGAA GCCATCGCCC 
GCGATCACCC GACTGCGCAA CGCCACCACG CGGGCGGCGT 
GCCACACACG CCGCCGCGAT CTCCCCCTGC GAGTGTCCGA 
CCATGCGCCT GCCACAGCGC GGCCAGGCTC ACCGCGACCG 
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414 01 CCCAGCTGGC CGGCTGGACC ACCTCCACCC GCTCCGCCAC 
414 61 CCCGCACATC CCAGCCCGTG TGCGGCAACA ACGCCCGCGC 
41521 CGAACACCGC GGAACGGTCC ATGAGTTCCA CGCCCATGCC 
41581 CGGGGAAGAC GAACACCGTA CGCGGCTGAT CCACCGCCAC 
5 41641 CCAGCAGCAC CGCACGGTGA CCGAAGACAG CACGCTCACG 
41701 CGGCCACATC CACCCCACCC CCGCGCAGAT ACCCCTCCAG 
417 61 GACTCACCTC ACCACGAGCC GACACCGGCA ACGGCACCAA 
41821 CACGCGACGG CCCAGGAACA CCCTCCAGGA TCACGTGCGC 
41881 ACGACGACAC ACCCGCATGC GGTGCCCGAT CCGACTCGGG 

10 41941 GCAGCTCGAC CGCACCGGCC GACCAGTCCA CATGCGACGA 
4 2001 TCTTCGGCGC GATCCCATGC CGCATCGCCA TGACCATCTT 
42061 CAGCCGCCTG CGCATGACCG ATGTTCGACT TGACCGAACC 
4 2121 GGTCCTGCCC GTAGGCCGCG AGGACGGCCT GCGCCTCGAT 
4 2181 CGGTGCCGTG CGCCTCCACC ACGTCCACAT CGGCGGCGCG 

15 4 2241 CCTGCCGGAT CACGCGCTGC TGGGCGACGC CGTTGGGGGC 
4 2301 CGTCCTGGTT CACCGCCGAG CCGCGGACGA CCGCGAGAAC 
42361 CGTCGGAGAG CCGCTCCAGC ACGAGAACGC CGACGCCCTC 
424 21 CCGCGTCGGC GAACGCCTTG CACCGTCCGT CCGGGGAGAG 
42481 CCACGAGCTC TGCGGTGTTC GCCATGACGG TGACACCGCC 

20 42541 CCCCGGCCCG CAGTGCCTGT GCCGCCTGGT GCAGGGCGAC 
42601 TGTCGACCGT GACCGCCGGG CCCTGAAGTC CGTACACGTA 
4 2661 CGCTCGTCTG CGTCGCCGTG ACACCGAGCC CGCCCAGGTC 
4 2721 GGTTGAACGC GCCCATGAAC ACGCCGGTGT CGCTCTCCCG 
4 2781 CGGCGTTCTC GAACGCCTCC CAGGAGGTCT CCAGGATCAG 

25 4 2841 CCAGCGCCTC GTTCGGACTG ATGCCGAAGA ACGCGGCGTC 
4 2901 ATCCGCCGTG GCGTGTCGTG GAGCGGCCGG CCGCGTCCGG 
42 961 CGACGTCCCA GCCCCGGTCG GTGGGGAACT CGGTGATCGC 
4 3021 GCCGCCACAG GTCCTCCGGC GAGGCGACCC CGCCGGGCAG 
4 3081 TCGCGACGGG GTCGCCGGAG CCGAGGGTCT GGGCGGTCGC 

30 4 3141 CGGCGAGGTG GGCGGCGAAC GCACGCGGAG TGGGGTGGTC 
4 3201 CCCGCAGACC CGTCCGCGCG GCGACGGTGT TGGTGAACTC 
4 32 61 GGCCGTTCTC GCGGAACGTG CGGTCCGGGG AGCAGTGTCC 
4 3321 CGGTGGCGAC GCTGTCGCGG ACCAGGTCGA GCAGTACGTC 
4 3381 CGGCGAGGCG GTTCGCCCAC TCCTGTTCCG TGGCGTCGGG 

35 4 3441 CGGTGAGGAT CGGCGGCGTG GCGCCCGCCA TCGTCGCGGC 
4 3501 TCCGGGCCAC GATGTACGAG CCGCCGCCCG CGATGGCCTT 
4 3561 GCGCCGGCCG TTCGATGCCG GGCAGCGCGC GGACGGTGAC 
4 3621 CCCGTGGCCG GGTGTGGGCG TCGGCGCCGG CCGGGCCGTC 
4 3681 CGCCGGGGTT CGCGGCTTCC TCGGCTGCGG TGGTCACGTG 

40 4 3741 GGAGCAGGCC GGCGACGGTG TCGGCGTCCT CCCCGGTGAC 
4 3801 CGATCGGAGG CGGCACGGTG AGGACCATCT TGCCGGTGTG 
4 38 61 CGAACGCGTC CCGCGCACGG CGGATGTCCC ACGGCTGCAC 
4 3921 CGCGGTCGAA CAGGTCGAGG AGCAGTTCGA GGATCTCCCG 
4 3981 CGGCCAGGTC GAACGGCTGC TGGGCGGCGT GGCGGATGTC 

45 4 4041 ACCGGCCGCC CGGTGCGAGC AGGCCGATGG ACGCGTCGAG 
4 4101 TGAGCACGAC GTCGACCGGC GGGAAGGTGT CGGCGAACGC 
4 4161 CATGGTCGGT GTCGAAGCCG TCGGCGTGCA GCAGGTGTTG 
4 4221 CGTACACCTC GGCGCCGAGG TGGCGGGCGA TCCGGGTCGC 
4 4281 TCGCGGCGTG GACCAGGACC TTCTGGCCGG GTCGCAGCTC 

50 4 4 341 ACCAGGCGGT GGCGAACACG ATGGGCACGG ACGCGGCGAT 
4 4 401 GGATCCGTGC GACCAGCCGC CGGTCCGCGA CCACGCTGCG 
4 44 61 GACCGAACAC GCGGTCGCCG GGGGCCAGGT CGTCGACGCC 
4 4521 TGCCCGCGGC CTCCCCGCCC ATCTCGCCCT CGCCCGGGTA 
4 4581 CGTCGCGGAA GTTCAGCCCC GCGGCGCGGA CGTCGATGCG 

55 4 4 641 GCGCGGCGGG ACGTCGAGCG GGGCGACGAC GAGGTCGCGG 
4 4701 GCGCAGCGCC CACTGGCGCG GTCGGCAGGG GGGTGGTGTC 
4 4761 CGTAGGCCAC GCCGGCCCGC AGCGCGATCT GGGGTTCGCC 
4 4821 CGAGGTCGTC ATCGCCGTCC GTGTCCACCA GCACGAACGA 
4 4 881 GGCGCAGCGC CTCGTCCCAG AGCCGGGCCT GGTCCGCGTC 

60 4 4 941 CGCGCACCGC GCGGCGGGTG ACGACCGTCC GGCGGGGTGA 
4 5001" GCCGCTCCCA GACCAGTTCG CACAGCGTGG CCTCGCCACT 
45061 CCGGCAGCCC CGCGAGCCGC GCGCGCTGGA CCTTGCCCGA 
4 5121 TGACGTGCCA GATCTCGTCG GGCACCTTGA AGTAGGCGAG 
4 5181 GGATCGCCTC GGCGGGGACG CGGGGGCCGT CGGAAACGAC 
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ATCCGACCGC GACAACATCT 
ACACTCCTCC ATACGAGCCG 
CACCCACTGG GCACCCTGCC 
ACCCATCACC CGGGCATCAC 
CACCAACCCC TGCGCGACCG 
CCGCTCCACC TGCCCCCGCA 
CCCATCACCA CCCGACTCCA 
GTTCGTACCG CTCACCCCGA 
CCACGGCCTC GCCTCGGTGA 
CGGCTCGTCC ACGTGCAGCG 
GATGACACCG GCGACACCCG 
GAGGTAGAGC GGCGTGTCGC 
CGGGTCGCCC AGCCGCGTGC 
CAGTCCGGCG TTGACCAACG 
GGACAGTCCG TTGGAGGCAC 
GGTGTGCCCG TTGCGCTCGG 
GGCGAAGCCG GTCCCGTCCG 
TCCGCGCTGC CGGGAGaACT 
GACCAGCGCC AGGGAGCACT 
CAGCGACGAC GAGCACGCCG 
CGAGAGGCGC CCGGACAGGA 
CCGGCCGACG CCGTAGCCCT 
GAGCCTGTCC GGCACGATGC 
GCGCTGCTGG GGGTCCATCG 
GAACCCGGCG CCGGCCAGGA 
GTCCGGGTCG TACAGCGCGT 
CTCGGTACCG GCGGCGACGA 
TCGGCACGCC ATGCCGACGA 
GGGTGCCGCT GTCGCGGAGC 
GAACGCGGTT GACGCGGGCA 
GACGGTGGTG AGCGAGTCGA 
GGCGCCCGGC AGGCCCAGGA 
CTCCCGGCCC GCACGGGCCG 
CTCGGCCGGT CCGGTCAGTG 
CCGCGCCCCG GCGGAACCGG 
CTCGATCAGG TCGCCGGTGA 
GGTGGGGAGT CCCTCCGCGG 
GAGCAGGACG TGCACGAGCG 
GGTGAGGCCG GTCTCGTCGC 
CAGGACCGGC GCGTCCGGGC 
CCGGGCGTGG CTCATCCACG 
CGGCAGCGGG CACAGCTCAC 
CAGGCGCGCG <3GATCCACGT 
GGTCTTGCCC ATCTCGACGA 
GAGTTCACCG GTGAGCGAGT 
GGCGCTGCGG GAGTTCGCCA 
TTTGGCGGGA CTGGCGGTGG 
CGCCATGCCG ACACCGCCCG 
GCCCGCGTCG ACGAGGCCGT 
GGGGAACGAC CATCCCCGTG 
CCGGAACGCG TCCTGCACGA 
GGGTCCGACT TCGGTCACGA 
GGTGCCGAGC GCGATCAGCA 
GACCTCGCCG GCGGCCAGGG 
AGCGTTCCGG AGGCGGGCGG 
CGCGCGTACC AGCCGGvjGCA 
GAGCGAGGCC GCGGCGGGGA 
TCCGGGTTCG GCGGCCTGGC 
CGGGATCTCG <3CCGGGCCGA 
CGGGGTGCCG GGCAGGTCGC 
GCCGGTGGCG ACCAGATGGG 
CGCGGTGCGG <3GGATCGTGG 
CCGGCGGCGG CACTCGGCGA 
GTAGAGCACG GGTATGTCGC 
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4 5241 CGAGGACGGG GTGCGGGCGG 

4 5301 CGACGGTCTC GATCTCCCGG 

4 5361 CCCGGCCGGT GATCGTCACG 

4 5421 ACCAGCCGTC CACGAGCACC 

5 4 5481 GGCTCGGCCC GCTCGCCCAC 

.4 5541 CCGGGTCGAC GAACCGCAGC 

4 560jl GCGCATCCTC CAGGGTGTTG 

4 5661 CGAGCAGGGG CACGCCGAAC 

4 5721 ATCCGGCGAC CAGCGCCACG 

10 4 5781 GGAGGTAGCG GTACATCGTC 

4 5841 CGTCGAGGAC GTCACGCGCG 

4 5 901 GGACGGCGAG CAGGCAGAGG 

4 5961 GTTCGTCGTC CTCGGTCAGC 

4 6021 CGCTGCGCTG TGCGGAAACC 

15 4 6081 TCCAGGCGGG TTCGTCCAGG 

4 6141 CGAGGTCCTC GTAGGAGACG 

4 6201 CGGTGCCGGT GCGGCGCACC 

4 62 61 CGGAGTCCGT CAGGAAGTGG 

4 6321 CGACGGCGGC GGCGCGGGCG 

20 4 6381 GCAGCATCGC GACCCGGTCG 

4 6441 GGCCGGCCCG GAGCCGGAGT 

4 6501 TCCGGTCGCC GCGTCGCTCG 

4 6561 CCACACGCGC CATGGAAACA 

4 6621 ACGAGTAGAC GCCGGCGACG 

25 4 6681 CTACCGTGGC CGGCCTCCCC 

4 674i AATTGCCTTC CTGATGACCG 

4 6801 TGTCACGGCG CCGTATTGCC 

4 6861 GACGGTGCTC GGCCTGATCG 

4 6921 TGCTCCCCGG ACCGCCGTGC 

30 4 6981 GCACGCACAG CGCCCTGTCG 

47041 CGCGTACCTG TTCGGTGTCG 

4 7101 GGCCCTCTAC ACGAACGTCT 

47161 GACCTGGAAC TACGTCAGCG 

4 7221 GGACTTCTGC GTGGGCCGCG 

35 4 7281 GCCCGCGGCC ACCGGTATCG 

4 7 341 CCGGGGCGGA GTGCGGATCA 

. 4 74 01 GACGACGTAC GGTCCGCGGC 

47 4 61 GGGGGGCCGG CTGTTCATCT 

47521 CGGTGATGTG ACCGGCCAGT 

40 '4 7 581 GGAGAACCTG CGGCGCCACG 

47641 CAAGGTCTAC GTCCGCCGCC 

4 7701 CCTGTCGAGC ACCGCGGCCG 

4 77 61 CGTCGAAATC GAAGGCATGG 

4 7821 CTCGGCGGAT CCGCGAAGAG 

45 47881 TCGTCCTTCG CACAGCGGCG 

4 7 941 TATAATCTCC CGCTCGTGCA 

48001 GCGCTGGCGC TCGTCGTCGC 

4 8061 GGCGAGCCCC TCCAGCGGGT 

4 8121 GGCAGCGAGG AGGACGCCGC 

50 4 8181 GCCACCGGGC CGTTGATCAG 

4 8241 GCGGTGACCG TGCACCATGT 

4 8301 CTCGCAGCCC ACTACACGGC 

4 8361 CCGGTGCAGT ACGCCGACTT 

4 8421 GACAGGCGTC TGGCCTACTG 

55 4 8481 CCCACCGACC GTCCCCGCCC 

4 8541 CCGCCGGCCG CGCTGGCCAC 

48601 TTCATGACCC TGCTGGCGGC 

48661 GTGCTGGTCG GCACGCCCGT 

48721 ATGTTCGTCA ACACGCTCGC 

60 4 8781 CTCCTCGACC GCTGCCGGGC 

" 48841 GAGAACGTCA TCGAACTCGT 

4 8901 GTGOTGTTGC AGGTGCTGCG 

4 8961 GAACCGTTCC GCACCGGACG 

4 9021 GAGCCGGGTG GCGCGCTGAC 
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CCCGCCGCGG CGGCGTCCCG GACACCGGCC ACCTCCTGGG 
GGGTGGATGT TCTCCCCGCC GCGGATGATC AGCTCCTTGA 
TGTCCGGTCT CGGCCTGACG TGCGAGGTCC CCGGTGCGGT 
TGGGCGGTCG CCTCCGGCTG GGCGTGGTAG CCGAGCATGA 
AGCTCGCCCT CCTCGCCGGG TGCCACGTCG GCGCCGGACA 
GACAGGCCCG GCACGGGCAG CCCGCACGAG CCGGGAACCC 
GCGGTGAGCG AGCCGGTCGT CTCGGTGCAG CCGTACGTGT 
GTCGCCTCGA AATCCCTGGT GAGCGACGCC GGCGAGGTGG 
CGCAGCGCGC GAGCCCGCGG CTCGCCGGAC ACGGCGCCGA 
GGCACGCCGA CGAGCACGGT GCTGGAGTGT TCGGCCAGGG 
ACGAAGCCGC CCAGGATACG GGCGGACGCG CCGACCGTGA 
TGGTGGCCGA GGCTGTGGAA CAGCGGGGCG GGCCAGAGCA 
CGCCAGGACG GCACGTCGCA GTGCATCGCG GACCACAGGC 
ACGCCCTTGG GACGGCCGGT GGTGCCGGAG GTGTAGAGCA 
CCGAGGTCGT CGCGGGGCGG GCACGGCGGC TCGGTCCCGG 
CAGTCCGGTG CCCGGCGCCC GACGAGCACG ACGGTGGCGT 
TGGTCGAGGT GGGTTTCGTC GGTGACCAGC ACGGTCGCGC 
GCGAGTTCGG CGTCGGCGGC GTCCGGGTTG AGCGGGACCG 
GCGGCGAGGT AGACCTCGAT GGTCTCGATC CGGTTGCCGA 
CCGCGGTCGA CGCCGGACGC GGCGAGGTGT CCGGCGAGCC 
TGCGTGTACG TCACGGCGCG TTGGGAATCC GTGTAGGCGA 
GCATGGATGC GGAGCAATTC GTGCAACGGC CGGATTGGTT 
CCTTTCTCTC GACCAACCGC ACAACAGCAC GGAACCGGCC 
CTAGCAGCGT TTTCCGGACC GCCACCCCCT GAAGATCCCC 
GGACGCTCAT CTAGGGGGTT GCACGCATAC CGCCGTGCGT 
ATGCCGGACG CCAGGGAAGG GTGGAGGCGT TGTCCATATC 
GCTTCGAGAA GACCGGATCA CCGGACCTCG AGGGTGACGA 
AGCACGGCAC CGGCCACACC GACGTGTCGC TGGTGGACGG 
ACACCACGAC CCGTGACGAC GAGGCGTTCA CCGAGGTCTG 
AGTCCGGCAT GGACAACGGC ATCGCCTGGG CCCGCACCGA 
TGCGCACCGG CGAGAGCGGC AGGTACGCCG ATGCCACCGC 
TCCAGCTCAC CCGGTCGCTG GGGTATCCCC TGCTCGCCCG 
G T ATCAACAC GACGAACGCG GACGGGCTGG AGGTGTACCG 
CCCAGGCGCT CGACGAGGGC GGGATCGACC CGGCCACCAT 
GCGCCCACGG GGGCGGCATC ACCTGCGTGT TCCTCGCCGC 
ACATCGAGAA CCCCGCCGTC CTCACGGCCC ACCACTACCC 
CCCCGGTCTT CGCACGGGCC ACCTGGCTGG GCCCGCCGCA 
CCGCGACGGC CGGCATCCTC GGACACCGAA CGGTGCACCA 
GCGAGGTCGC CCTCGACAAC ATGGCCCGGG TCATCGGCGC 
GCGTCCAGCG GGGGCACGTC CTCGCCGACG TGGACCACCT 
GCGAGGATCT CGATACGGTC CGCCGGGTCT GCGCCGCACG 
TCGCCCTTTT GCACACCGAC ATAGCCCGCG AGGATCTGCT 
TGGCGTGACA ATACCCGGTA AAAGGCCCGC GACGCTGCGC 
AAAGAAGAGC GTCACCGCAC AGCGCGGCAG CCCGGTCCTT 
GATCTGGTTT CTCCAGCAAT TGGACCCGGA GAGCAACGCC 
ACGCCTGCGC GGTCTATTGG ACGCGCCGGC CCTGGAGCGT 
GCGCCACGAG GCGTTGCGGA CGGTGTTCGA CACCGCCGAC 
GCTTCCCGCC CCGGAACACC TCCTGCGCCA CGCGCGGGCG 
CCGGCTCGTC CGCGACGAGA TCGCCGCGCC GTTCGACCTC 
GGCCCTGCTG ATCCGCCTCG GTGACGACGA CCACGTTCTC 
CGCCGGCGAC GGCTGGTCGT TCGGGCTCCT CCAACATGAA 
GCTGCGCGAC ACTGCCCGCC CTGCCGAACT GCCGCCGTTG 
CGCCGCCTGG GAGCGGCGCG AACTCACCGG CGCCGGACTG 
GCGCGAGCAA CTCCGGGGCG CCCCGGCGCG GCTCGCCCTC 
GCCGGTCGCC GACGCGGACG CGGGCATGGC CGAGTGGCGG 
CGCGGTCCTC ACGCTCGCGC GCGACTCCGG TGCGTCCGTG 
CTTCCAAGCG GTCCTCGCCC GGCAGGCGGG CACGCGGGAC 
GGCGAACCGT. ACGCGGGCGG CGTACGAGGG CCTGATCGGC 
GCTGCGCGGC GACCTCTCGG GCGATCCGTC GTTCCGGGAA 
CACGACCACG GACGCGTTCG CCCACGCCGA CCTGCCGTTC 
CGCACCGGAA CGCGACCTGT CGGTCAACCC GGTCGTCCAG 
GCGCGACGCG GCGACGGCCG CGCTGCCCGG CATCGCGGCC 
CTGGTTCACC CGCTTCGACC TCGAATTCCA TGTGTACGAG 
CGGCGAACTG CTCTACAGCC GTGCGCTGTT CGACGAGCCA 
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4 9081 CGGATCACGG GGTTGCTGGA 
4 9141 GACGTACGGC TGTCGCGGCT 
4 9201 TCGAACGACA CGGCGCGGGA 
4 9261 GCCGCACGCA CCCCCGGCGC 
5 4 9321 CAGCTGGACC GGCGGGCGAA 
.4 9381 GGCGACCTGG TCGGGATCTG 
4 9441 ATCCTCAAGG CGGGCGCCGC 
4 9501 GCGTTCGTGC TGGCCGACGC 
4 9561 CGGTTCCCCG ATGTGCCGCA 

10 4 9621 GACGACACGG CGCCGGACGT 
4 9681 TCCGGGTCGA CCGGCAGGCC 
4 9741 CTGCTCTGGC AGGAGCGCAC 
4 9801 ACGCCCACGT TCGACTACTC 
4 9861 GTCATCCCGC CGGACGAGGT 

15 4 9921 C AGGCGAT T A CCGGGATCTA 
4 9981 GATCCGCACA GCGACCAGCT 
50041 ATCCTCGACG CGCGGTTGCG 
50101 CACTACGGTC CGGCCGAAAG 
50161 GCGTGGCCCG CCACCGCACC 

20 50221 GACGAGGCGA *TGCGGCCGGT 
50281 GGCCTCGCCC GTGGGTACCT 
50341 GATGCGGTCG GCGAGGAGCG 
50401 GGCGACCTGG AATTCCTCGG 
504 61 GAACCGGGTG AGATCGAGAG 

25 50521 TCCGTGCGCG AGGACCGGCG 
50581 GGCCGGCACG GCGACGACTT 
50641 GCCGCGCTCG TGCCCTCCGC 
50701 AAGGTGGACC GGCGCGCGCT 
507 61 ACGCCCCGCA CCGATGCCGA 

30 50821 CCGCGGGTCG GTGCCGACGA 
50881 CGGGTCGTCT CCCGCATCCG 
50941 GACGGGCGGA CGCCCGCCGC 
51001 CCCCCGATCG CGCCCTCCGC 
51061 ATGCTGCACT CGCACGGCTC 

35 51121 TTCCGGCTGC GCGGGCCACT 
51181 GCGCGCCACG AGCCGCTGCG 
51241 GCTCCGGTGC GCGCCGAGGT 
51301 GTCGCCCACC GGGAGCTGAC 
51361 GTGCTGCTGC CGCTGGGCGC 

40 51421 GGTGACGGAT GGTCCTTCGA 
51481 CCGGTGTCCT ACACGGACGT 
51541 GAGAACGACC GGGCCTACTG 
51601 GCGGTCCGGC CCGGCGGGGC 
51661 GCCGTCCTGG CGGCACGCCG 

45 51721 CTCGGCGCCT TCGCCCTGGT 
51781 ACGCCGTTCG CGGACCGGGG 
51841 GTCCTCGCGC TGCGCCTCGA 
51901 GTGCACACCG CGATGGTGGG 
51961 GCCGAGGACC CCGCGCTGCC 

50 52021 GCGGAACTGC GGCTGCCCGG 
52081 GACGAGATGA CCGGCGAACT 
52141 GCGGTGGTCC ACGATGCCGC 
52201 GTGGAGGCGA CGCTGCGTGC 
52261 GAAAGCGAGT AGCCATGCCC 

55 52321 CGGAACTCCA GAAGACCCGT 
52381 GG ATGGCCT G CCGGCTGCCC 
52441 AGTCCGGTGG CGACGGCATC 
52501 ACGGTCGCGG CGGCTTCCTC 
52561 GCCCGCGCGA GGCGCTGGCG 

60 52621 AGGCGTTCGA GCACGCGGGC 
52681 TCCTCGGCGC GTTCTTCCAG 
52741 CGAGCATTCA CACGAGCGTG 
52801 CGGCGGTCAC GGTCGACACG 
52861 AGTCGCTGCG CTCCGGCGAA 
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GGAGTTCACG GCGGTGCTTC AGGCGGTCAC CGCCGACCCG 
GCCGGCCGGC GACGCGACGG CGGCAGCGCC CGTGGTGCCC 
CCTGCCCGTC GACACGCTGC CGGGCCTGCT GGCCCGGTAC 
CGTGGCCGTC ACCGACCCGC ACATCTCCCT CACCTACGCG 
CCGCCTCGCG CACCTGCTCC GCGCGCGCGG CACCGCCACC 
CGCCGATCGC GGCGCCGACC TGATCGTCGG CATCGTGGGG 
TTATGTGCCG CTGGACCCCG AACATCCTCC GGAGCGCACG 
GCAGCTGACC ACGGTGGTGG CGCACGAGGT CTACCGTTCC 
CGTGGTGGCG TTGGACGACC CGGAGCTGGA CCGGCAGCCG 
CGAGCTGGAC CGGGACAGCC TCGCCTACGC GATCTACACG 
GAAGGCCGTG CTCATGCCGG GTGTCAGCGC CGTCAACCTG 
GATGGGCCGC GAGCCGGCCA GCCGCACCGT CCAGTTCGTG 
GGTGCAGGAG ATCTTTTCCG CGCTGCTGGG CGGCACGCTC 
GCGGTTCGAC CCGCCGGGAC TCGCCCGGTG GATGGACGAA 
CGCGCCGACG GCCGTACTGC GCGCGCTGAT CGAGCACGTC 
CGCCGCCCTG CGGCACCTGT GCCAGGGCGG CGAGGCGCTG 
CGAGCTGTGC CGGCACCGGC CCCACCTGC<3 CGTGCACAAT 
CCAGCTCATC ACCGGGTACA CGCTGCCCGC CGACCCCGAC 
GATCGGCCCG CCGATCGACA ACACCCGCAT CCATCTGCTC 
TCCGGACGGT ATGCCGGGGC AGCTCTGCGT CGCCGGCGTC 
GGCCCGTCCC GAGCTGACCG CCGAGCGCTG GGTGCCGGGA 
CATGTACCTC ACCGGCGACC TGGCCCGCCG CGCGCCCGAC 
CCGGATCGAC GACCAGGTCA AGATCCGCGG CATCCGCGTC 
CCTGCTCGCC GAGGACGCCC GCGTCACGCA GGCGGCGGTG 
GGGCGAGAAG TTCCTGGCCG CGTACGTCGT ACCGGTGGCC 
CGCCGCGTCG CTGCGCGCGG GACTGGCCGC CCGGCTGCCC 
CGTCGTCCTG GTGGAGCGAC TGCCGAGGAC CACGAGCGGC 
GCCCGACCCG GAGCCGGGCC CGGCGTCGAC CGGGGCGGTT 
GCGGACGGTG TGCCGGATCT TCCAGGAGGT GCTCGACGTC 
CGACTTCTTC ACGCTCGGCG GGCACTCCCT GCTCGCCACC 
CGCCGAGCTG GGTGCCGATG TCCCGCTGCG TACGCTCTTC 
GCTCGCCCGT GCGGCGGACG AGGCCGGCCC GGCCGCCCTG 
GGAGAACGGG CCGGCCCCCC TCACCGCGGC ACAGGAACAG 
GCTGCTCGCC GCGCOCTCCT ACACGGTCGC CCCGTACGGG 
CGACCGCGAA GCGCTCGACG CGGCACTGAC CCGGATCGCC 
GACCGGGTTC CGCGATCGGG AACAGGTCGT CCGGCCGCCC 
GGTTCCGGTG CCGGTCGGCG ACGTCGACGC CGCGGTCCGG 
CCGGCCGTTC GACCTCGTGA ACGGGTCGTT GCTGCGTGCC 
CGAGGATCAC GTGCTGCTGC TGATGCTGCA CCACCTCGCC 
CCTCCTGGTC CGGGAGTTGT CGGGGACGCA ACCGGACCTT' 
GGCCCGGTGG GAACGGAGTC CGGCCGTGAT CGCGGCCAGG 
GCGCCGGCGG CTGGGGGGCG CCACCGCGCC GGAGCTGCCC 
ACCGACCGGG CGGGCGTTCC TGTGGACGCT CAAGGACACC 
GGTCGCGGAC GCCCACGACG CGACGTTGCA CGAAACCGTG 
CGTGGCGGAG ACCGCCGACA CCGACGACGT GCTCGTCGCG 
GTACGCCGGG ACCGACCACC TCATCGGCTT CTTCGCGAAG 
CCTCGGCGGC ACGCCGTCGT TCCCCGAGGT GCTGCGCCGG 
CGCGCACGCC CACCAGGCGG TGCCCTACTC CGCGCTGCGC 
GCCGGCCCCC GTGTCGTTCC AGCTCATCAG CGCGCTCAGC 
CATGCACACC GAGCCGTTCC CCGTCGTCGC CGAGACCGTC 
G TCG ATCAAC CTCTTCGACG ACGGTCGCAC CGTCTCCGGC 
GCTGCTCGAC CGTGCCACCG TCGACGATTT GCTCACCCGG 
CGCCGCGGGC GACCTCACCG TACGCGTCAC CGGTTACGTG 
GAGCAGGACA AGACAGTCGA GTACCTTCGC TGGGCGACCG 
GCGGAACTCG CCGCGCACAG CGAGCCGTTG GCGATCGTGG 
GGCGGGGTCG CGTCGCCGGA GGACCTGTGG CAGTTGCTGG 
ACCGCGTTCC CCACGGACCG GGGCTGGGAG ACCACCGCCG 
ACCGGGGCGG CCGGCTTCGA CGCGGCGTTC TTCGGCATCA 
ATGGACCCGC AGCAGCGCCT GGCCCTGGAG ACCTCGTGGG 
ATCGATCCGC AGACGCTGCG GGGCAGTGAC ACGGGGGTGT 
"GGGTACGGCA TCGGCGCCGA CTTCGACGGT TACGGCACCA 
CTCTCCGGCC GCCTCGCGTA CTTCTAC<3GT CTGGAGGGTC 
GCGTGTTCGT CGTCGCTGGT GGCGCTGCAC CAGGCCGGGC 
TGCTCGCTCG CCCTGGTCGG CGGCGTCACG GTGATGGCCT 
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52 921 CGCCGGCGGG GTTCGCGGAC 
52981 AGGCCTTCGC GGAAGCGGCT 
53041 TCGAGAAGCT CTCCGACGCC 
53101 CCGCCGTCAA CCAGGACGGT 
5 53161 AGCGGGTGAT CCGGCAGGCC 
. 53221 TCGAGGCCCA CGGCACCGGC 
53281 CCACCTACGG GCAGGGGCGC 
53341 GCCACACCCA GGCCGCCGCG 
534 01 ACGGCACCCT GCCCCGCACC 

10 534 61 CCGGCGCCGT CGAACTCCTC 
53521 GCGCCGGTGT CTCCTCCTTC 
53581 ACCCCCGACC GGCCCCCGAA 
53641 TCTCGGCCCG CACCCCGCAG 
53701 ACGACAACCC CGGCGCGGAC 

15 537 61 TCGAGCACCG CGCCGTGCTG 
53821 GCGGACCGGT GGTCTTCGTC 
53881 AACTCGCGTC CACCTACCCC 
53941 ACCCCACCCA GGGCCCGGCC 
54 001 GGTCCTGGGG CATCACCCCG 

20 54 061 CGCACGCCGC CGGTGTCCTG 
54121 GCCTGATGGA CCAACTGCCG 
54181 AGGCACGCCA GGTGCTGCGG 
54 241 TCGTGCTGTC CGGGGACGAG 
54 301 ACCGCCTGCC GACCCGCCAC 

25 54 361 TCCTCGACGT CGCCCGGACC 
54 421 CCACCACCGC CGAATACTGG 
54 4 81 CCGAGCAGTA CCCGGGCGCG 
54 541 TCGTCGACGG CGTTGCCGCC 
54 601 CGCTCGCGCA GCTCCACGTC 

30 54 661 ACCGCGCGCC CGTCACGCTG 
54721 CCACCTCCCG GGCCGATGTG 
547 81 GCGCCGCGGT CGCGCTGCCC 
54841 CCTCCCATCC GTGGCTCGGC 
54 901 CCTTCCTCGA ACTCGCGGCG 

35 54 961 TCGTCATCGA GACGCCGCTC 
55021 TCGCCGAACC CGACGACACG 
55081 CGGGCCTGTG GACCCGACAC 
55141 CCACGGACCC GGCACCCTGG 
55201 ACGACCGGTT CGAGGACATC 

40 55261 CCTGGCGCGC CGGCGACACC 
55321 ACGCCGCCCG TTTCACGCTG 
55381 TGGCCGCGCT CGACGCACCC 
554 41 GCATCCACGC GGCCGGGGCG 
55501 GCACCGTCCG CATGACCGGC 

45 55561 CGCGCCCGTA CGCGGAAGGC 
55621 CGATGCCCGT CCCGTCCGCG 
55681 ACGGCGACGT TCCGGCGGCC 
55741 GGCACCTGTC CGCCGCCGAG 
55801 CTGCCGCCGC CGCGGGTCTG 

50 55861 TCGTCGAGGC GTCCCCGGAC 
55921 AACCGCAGCT GGCCGTCCGG 
55981 ACCCCGCGCA CGGCCCGCTG 
56041 CCGGCACGTT GCACGACGTC 
56101 CCGGCGAGGT CCGCATCGAC 

55 56161 CGCTCGGGAC GTACACCGGG 
56221 AGACCGGGCC CGGCGTGGAC 
56281 GCGGCATCGG CCCGACGGCC 
56341 GGAGCTTCAC CACGGCGGCG 
56401 TCGACCTCGG CACACTGCGC 

6Q 564 61 TCGGCATGGC CGCCGCACAG 
56521 GTACCGGCAA GCAGCACGTC 
56581 CTCGGACGAC CGCGTTCCGG 
56641 CCGGCGAGTT CATCGACGCG 
56701 TGGGCCGCAC CGAGCTGCGC 
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TTCTCCGAGC AGGGCGGCCT GGCCCCCGAC GCGCGC^GCA 
GACGGCACCG GTTTCGCCGA GGGGTCCGGC GTCCTGATCG 
GAGCGCAACG GCCACCGCGT GCTGGCGGTC GTCCGGGGTT 
GCCTCCAACG GGCTGTCCGC GCCGAACGGG CCGTCGCAGG 
CTGGCCAACG CCGGACTCAC CCCGGCGGAC GTGGACGCCG 
ACCAGGCTGG GCGACCCCAT CGAGGCACAG GCCGTGCTGG 
GACACCCCTG TGCTGCTGGG CTCGCTGAAG TCCAACATCG 
GGCGTCGCCG GTGTCATCAA GATGGTCCTC GCCATGCGGC 
CTGCACGTGG ACACGCCGTC CTCGCACGTC GACTGGACGG 
ACCGACGCCC GGCCCTGGCC CGAAACCGAC CGCCCACGGC 
GGCGTCAGCG GCACCAACGC CCACATCATC CTCGAAAGCC 
CCCGCCCCGG CACCCGACAC CGGACCGCTG CCGCTGCTGC 
GCACTCGACG CACAGGTACA CCGCCTGCGC GCGTTCCTCG 
CGGGTCGCCG TCGCGCAGAC ACTCGCCCGG CGCACCCAGT 
CTCGGCGACA CGCTCATCAC CGTGAGCCCG AACGCCGGCC 
TACTCGGGGC AAAGCACGCT GCACCCGCAC ACCGGGCGGC 
GTGTTCGCCG AAGCGTGGCG CGAGGCCCTC GACCACCTCG 
ACGCACTTCG CGCACCAGAC CGCGCTCACC GCGCTCCTGC 
CACGCGGTCA TCGGCCACTC CCTCGGTGAG ATCACCGCCG 
TCCCTGAGGG ACGCGGGCGC GCTCCTCACC ACCCGCACCC 
TCGGGCGGCG CGATGGTCAC CGTCCTGACC AGCGAGGAAA 
CCGGGCGTGG AGATCGCCGC CGTCAACGGC CCCCACTCCC 
GAAGCCGTAC TCGAAGCCGC CCGGCAGCTC GGCATCCACC 
GCCGGCCACT CCGAGCGCAT GCAGCCACTC GTCGCCCCCC 
CTGACGTACC ACCAGCCCCA CACCGCCATC CCCGGCGACC 
GCGCACCAGG TCCGCGACCA AGTACGTTTC CAGGCGCACA 
ACGTTCCTCG AGATCGGCCC CAACCAGGAC CTCTCGCCGC 
CAGACCGGTA CGCCCGACGA GGTGCGGGCG CTGCACACCG 
CGCGGCGTCG CGATCGACTG GACGCTCGTC CTCGGCGGGG 
CCCACGTATC CGTTCCAGCA C AAGGAC T AC TGGCTGCGGC 
ACCGGCGCGG GGCAGGAGCA GGTGGCGCAC CCGCTGCTCG 
GGCACGGGCG GAGTCGTCCT GACCGGCCGC CTGTCGCTGG 
GAGCACGCGG TCGACGGCAC CGTGCTCCTG CCCGGCGCGG 
CGCGCCGGCG ACGAGGTCGG CTGCGACCTG CTGCACGAAC 
GTGCTGCCCG CGACCGGCGG TGTGGCGGTC TCCGTCGAGA 
GGGCGGCGGG CGGTCACCGT CCACGCGCGG GCCGACGGCT 
GCCGGCGGAT TCCTCGGCAC GGCACCGGCA CCGGCCACGG 
CCGCCCGCGG AAGCCGGACC GGTCGACGTC GCCGACGTCT 
GGGTACTCCT ACGGACCGGG CTTCCGGGGG CTGCGGHCCG 
GTGTACGCCG AGGTCGCGCT CCCCGACGAG CAGAGCGCCG 
CACCCCGCGC TGCTCGACGC CGCGTTCCAG GCCGGCGCGC 
GGCGGGGCGCa CCCGACTGCC GTTCTCGTTC CAGGACGTCC 
ACGCGGCTGC GGGTCACGGT CGGCCGCGAC GGCGAGCGCA 
CCGGACGGGC AGCTGGTGGC CGTGGTCGGT GCCGTGCTGT 
TCCGGTGACG GCCTGCTGCG CCCGGTCTGG ACCGAGCTGC 
GACGATCCGC GCGTGGAGGT CCTCGGCGCC GACCCGGGCG 
ACCCGGGAGC TGACCGCCCG CGTCCTCGGC GCGCTCCAGC 
GACACCACCT TGGTGGTACG GACCGGCACC GGCCCGGCCG 
GTCCGCTCGG CGCAGGCGGA GAACCCCGGC CGCGTCGTGC 
ACCTCGGTGG AGCTGCTCGC CGCGTGCGCC GCGCTGGACG 
GACGGCGTGC TCTTCGCGCC GCGGCTGGTC CGGATGTCCG 
TCCCTGCCGG ACGGCGACTG GCTGCTCACC CGGTCCGCCT 
GCGCTCATAG CCGACGACAC GCCCCGGCGG GCGCTCGAAG 
GTCCGCGCGG CCGGACTGAA CTTCCGCGAT GTGCTGATCG 
GCCACGGCCA TGGGCGGCGA GGCCGCGGGC GTCGTGGTGG 
GACCTGTCCC CCGGCGACCG GGTGTTCGGC CTGACCCGGG 
GTCACCGACC GGCGCTGGCT GGCCCGGATC CCCGACGGCT 
TCCGTCCCGA TCGTGTTCGC GACCGCGTGG TACGGCCTGG 
GCCGGCGAGA AGGTCCTCGT CCACGCGGCC ACCGGCGGTG 
ATCGCCCGCC ACCTGGGCGC CGAGCTCTAC GCCACCGCCA 
CTGCGCGCCG CCGGGCTGCC CGACACGCAC ATCGCCGACT 
ACCGCTTTCC CGCGCATGGA CGTCGTCCTG AACGCGCTGA 
TCGCTCGACC TGCTGGACGC CGACGGCCGG TTCGTCGAGA 
GACCCGGCCG CGATCGTCCC CGCCTACCTG CCGTTCGACC 
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56761 TGCTGGACGC GGGCGCCGAC 
56821 ACGCGGGCGC GCTGGAGCCG 
56881 CGCTCGGCTG GATGAGCCGC 
56941 CGCTCGACCC GGAGGGCGCC 
5 57001 TCGCCCGCCA CCTGCGCGAA 
.57061 GGACGCCCGG CGTCCACCTG 
57121 TGGAGCGGGT GGACCGGCCG 
57181 GCACCGTCGC GTCGCTCACC 
57241 GCGCCTGGTA CCTGCACGAG 

10 57301 CGTCGGCCGC CGGCGTGCTC 
57 361 TCCTCGACGC GCTCGCCGAG 
574 21 GGGGGCTCTG GGAGGACGTG 
574 81 GGATGCGGCG CAGCGGTTTC 
57 541 CGGCCGGCCG CACCGGAAGT 

15 57 601 TGCCGCTGCT GCGCGGCCTG 
57 661 CGTCCGCCGA CCGGCTCGCC 
57721 TCGTCCGGGA GAGCACCGCC 
57 781 CGGCGGCGTT CAAGGACCTC 
57 841 TCACCGAGGC GACCGGTGTG 

20 57 901 ACGTGCTCGC CGGGAAGCTC 

57 961 GGACCGCGGC CACGGCCGGT 
58021 GGCTGCCCGG CGGGGTCGCG 
58081 ACGCCATCAC GGAGTTCCCG 
58141 ACCCCGACGC GATCGGCAAG 

25 58201 GCTTCGACGC GGCGTTCTTC 
58261 AGCGGGTGCT CCTGGAGACG 
58321 CGACCCGCGG CAGCGACACC 
58381 GTGCGGACAC CGACGGCTTC 
584 41 TGTCGTACTT CTACGGTCTG 

30 58501 CGCTGGTGGC GCTGCACCAG 
58561 TGGTCGGCGG CGTCAGGGTG 

58 621 GCGGCCTCGC GCCGGACGGC 
58 681 TCGCCGAGGG TGCCGGTGTG 
58741 ACACCGTCCT GGCGGTCGTC 

35 58801 TGTCGGCGCC GAACGGGCCG 
588 61 GGCTCACCCC GGCGGACGTG 
58 921 ACCCCATCGA GGCACAGGCG 
58 981 TGCTGGGCTC GCTGAAGTCC 
59041 TCATCAAGAT GGTGCAGGCC 

40 59101 AGCCGTCGCC GCACGTCGAC 
59161 CGTGGCCCGA GACCGACCGG 
59221 CCAACGCCCA CGTCATCCTG 
59281 CCGGTGACCT TCCCCTGCTG 
59341 GCCGACTGCG CGCCTACCTG 

45 59401 CGCTGGCCCG GCGCACACAC 
594 61 CCACACCCCC CGCGGACCGG 
59521 AGCATCCCGC GATGGGCGAG . 
. 59581 ATGAAGCGCT CCGCCGCCTT 
59641 TGCTCTTCGC CCACCAGGCG 

50 59701 ACGCGGTCAT CGGCCACTCG 
59761 GGCTGGACGA CGCGTGCACC 
59821 CACCCGGTGC CATGGTCACC 
59881 CGGGCGTGGA GATCGCCGCC 
59941 ACGCCGTGCT CACCGTCGCC 

55 60001 CCGGGCACTC CGCGCACATG 
60061 . TCCGCTACCA CCCTCCCCAC 
60121 CCGAGCAGGT CCGCAAGCCC 
60181 TGTTCGTGGA GATCGGCCCC 
60241 AGAACGGCAC CGCGGACGAG 

60 60301 GCGGTGCCAC GCTCGACTGG 
60361 TGCCCGCGTA CGCGTTCCAA 
60421 CCGACGCGGG CCACCCCGTG 
604 81 TGTTCACGGG TTCCGTGCCG 
60541 TGGCCGCCGC GGACGCGGTC 
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CGCATCGGCG AGATCCTGGG CGAACTGCTC CGGCTGTTCG 
CTGCCGGTCC GTGCCTGGGA CGTCCGGCAG GCACGCGACG 
GCCCGCCACA TCGGCAAGAA CGTCCTGACG CTGCCOCGGC 
GTCGTCCTCA CCGGCGGCTC CGGCACGCTC GCCGGCATCC 
CGGCATGTCT ACCTGCTGTC CCGGACGGCA CCGCCCGAGG 
CCCTGCGACG TCGGTGACCG GGACCAGCTG GCGGCGGCCC 
ATCACCGCCG TGGTGCACCT CGCCGGTGCG CTGGACGACG 
CCCGAGCGTT TCGACACGGT GCTGCGCCCG AAGGCCGACG 
CTGACGAAGG AGCAGGACCT CGCGGCGTTC GTGCTCTACT 
GGCAACGCCG GCCAGGGCAA CTACGTCGCC GCGAACGCGT 
CTGCGCCACG GTTCCGGGCT GCCGGCCCTC TCCATCGCCT 
AGCGGGCTCA CCGCGGCGCT CGGCGAAGCC GACCGGGACC 
CGGGCCATCA CCGCGCAACA GGGCATGCAC CTGTACGAGG 
CCCGTGGTGG TCGCGGCGGC GCTCGACGAC GCGCCGGACG 
CGGCGGACGA CCGTCCGGCG GGCCGCCGTC CGGGAGTGTT 
GCGCTGACCG GCGACGAGCT CGCCGAAGCG CTGCTGACGC 
GCCGTGCTCG GCCACGTGGG TGGCGAGGAC ATCCCCGCGA 
GGCATCGACT CGCTCACCGC GGTCCAGCTG CGCAACGCCC 
CGGCTGAACG CCACGGCGGT CTTCGACTTC CCGACCCCGC 
GGCGACGAAC TGACCGGCAC CCGCGCGCCC GTCGTGCCCC 
GCGCACGACG AGCCGCTGGC GATCGTGGGA ATGGCCTGCC 
TCACCCGAGG AGCTGTGGCA CCTCGTGGCA TCCGGCACCG 
ACGGACCGCG GCTGGGACGT CGACGCGATC TACGACCCGG 
ACCTTCGTCC GGCACGGTGG CTTCCTCACC GGCGCGACAG 
GGCATCAGCC CGCGCGAGGC CCTCGCGATG GACCCGCAGC 
TCGTGGGAGG CGTTCGAAAG CGCCGGCATC ACCCCGGACT 
GGCGTGTTCG TCGGCGCCTT CTCCTACGGT TACGGCACCG 
GGCGCGACCG GCTCGCAGAC CAGTGTGCTC TCCGGCCGGC 
GAGGGTCCGG CGGTCACGGT CGACACGGCG TGTTCGTCGT 
GCCGGGCAGT CGCTGCGCTC CGGCGAATGC TCGCTCGCCC 
ATGGCGTCTC CCGGCGGCTT CGTGGAGTTC TCCCGGCAGC 
CGGGCGAAGG CGTTCGGCGC GGGTGCGGAC GGCACGAGCT 
CTGATCGTCG AGAGGCTCTC CGACGCCGAA CGCAACGGTC 
CGTGGTTCGG CGGTCAACCA GGATGGTGCC TCCAACGGGC 
TCGCAGGAGC GGGTGATCCG GCAGGCCCTG GCCAACGCCG 
GACGCCGTCG AGGCCCACGG CACCGGCACC AGGCTGGGCG 
GTACTGGCCA CCTACGGACA GGAGCGCGCC ACCCCCCTGC 
AACATCGGCC ACGCCCAGGC CGCGTCCGGC GTCGCCGGCA 
CTCCGGCACG GGGAGCTGCC GCCGACGCTG CACGCCGACG 
TGGACGGCCG GCGCCGTCGA ACTGCTGACG TCGGCCCGGC 
CCACGGCGTG CCGCCGTCTC CTCGTTCGGG GTGAGCGGCA 
GAGGCCGGAC CGGTAACGGA GACGCCCGCG GCATCGCCTT 
GTGTCGGCAC GCTCACCGGA AGCGCTCGAC GAGCAGATCC 
GACACCACCC CGGACGTCGA CCGGGTGGCC GTGGCACAGA 
TTCGCCCACC GCGCCGTGCT GCTCGGTGAC ACCGTCATCA 
CCCGACGAAC TCGTCTTCGT CTACTCCGGC CAGGGCACCC 
CAGCTCGCCG CCGCCCATCC CGTGTTCGCC GACGCCTGGC 
GACAACCCCG ACCCCCACGA CCGCACGCAC AGCCAGCATG 
GCGTTCACCG CCCTCCTGCG GTCCTGGGGC ATCACCCCGC 
CTGGGCGAGA TCACCGCGGC GCACGCCGCC GGCATCCTGT 
CTGATCACCA CGCGCGCCCG CCTCATGCAC ACGCTCCCGC 
GTACTGACCA GCGAAGAGAA GGCACGCCAG GCGTTGCGGC 
GTCAACGGGC CCCACTCCAT CGTGCTGTCC GGGGAC<3AGG 
GGGCAGCTCG GCATCCACCA CCGCCTGCCC GCCCGGCACG 
GAGCCCGTGG CCGCCGAGCT GCTCGCCACC ACCCGC<^GC 
ACCTCCATTC CGAACGACCC CACCACCGCT GAGTACTGGG 
GTGCTGTTCC ACGCCCACGC GCAGCAGTAC CCGGACGCCG 
GCCCAGGACC TCTCCCCGCT CGTCGACGGG ATCCCGCTGC 
GTGCACGCGC TGCACACCGC GCTCGCGCAC CTCTACGCGC 
CCCCGCATCC TCGGGGCTGG GTCACGGCAC GACGCGGATG 
CGGCGGCACT ACTGGATCGA GTC<3GCACGC CCGCCCGCAT 
CTGGGCTCCG GTATCGCCCT CGCCGGGTCG CC<5GGCCX3GG 
ACCGGTGCGG ACCGCGCGGT G T TCGTCGCC GAGCTGGCGC 
GACTGCGCCA CGGTCGAGCG <3CTCGACATC GCCTCOGTGC 
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60601 CCGGCCGGCC GGGCCATGGC 
60661 ACGGCCGGCG CCGGTTCACC 
60721 CCGAGGGGGT GCTGCGCCCC 
60781 CCCCACCGGG CGCGGTGCCC 
5 60841 TCTTCGCCGA GGCCGAGGTG 
.60901 ACGCGGTCTT CTCCGCGGTC 
60961 CGGTGCACGC GTCGGACGCC 
61021 CCATGGGATT CGCCGCCTTC 
61081 CGCTGCGGGA GGTGGCGTCA 

10 61141 AGTGGCTCGC GGTCGCCGAG 
61201 TCACCGCCGC CCACCCCGAC 
61261 CCCGCGTCCT GACCGCCCTG 
61321 ACACCACCAC CGACCCGGCC 
61381 AACACCCCCA CCGCATCCGC 

15 614 41 CCCAACTCGC CACCCTCGAC 
61501 CCCACCTCAC CCCCCTCCAC 
61561 ACGCCATCAT CATCACCGGC 
61621 ACCACCCCCA CACCTACCTC 
61681 ACCTCCCCTG CGACGTCGGC 

20 61741 AACCCCTCAC CGCCATCTTC 
61801 TCACCCCCGA CCGCCTCACC 
61861 ACCACCTCAC CCAAAACCAA 
61921 TCCTCGGCAG CCCCGGACAA 
61981 CCACCCACCG CCACACCCTC 

25 62041 CCACCAGCAC CCTCACCGGA 
62101 GTTTCCTCCC GATCACGGAC 
62161 GCGAGGACTT CGTCATGGCC 
62221 CGCCCATCCT GAGCGGCCTG 
62281 TCGCCCAGCG GCTCGCCGAG 

30 62341 TCTCGGACGC CACGGCCGCC 
624 01 CGACGTTCAA GGACCTCGGC 
624 61 CGGAGGCGAC CGGGCTGCGG 
62521 TCCTCGCCGC CAAGCTCCGC 
62581 CGGCACGGAC CCACCACGAC 

35 62641 GCGGGGTCGC CTCGCCGGAG 
62701 CCGAGTTCCC CACCGACCGC 
627 61 CCCCCGGCAA GACCTACGTC 
62821 'CCGCGTTCTT CGGCATCAGC 
62881 TCCTCGAAAC CTCCTGGGAG 

40 62941 GCAGCGACAC CGGCGTGTTC 
63001 TGGGCGGGTT CGGCGCCACC 
63061 TCTTCGGCAT GGAGGGCCCG 
63121 CCCTGCACCA GGCGGCACAG 
63181 GTGTCACGGT GATGCCCACC 

45 63241 CCCCCGACGG CCGTTGCCAG 
63301 GCGCCGGCGT TCTTGTGCTG 
63361 TCGCGGTCGT CCGCTCCTCC 
6342T CCAACGGCCC CTCCCAGCAG 
634 81 CCGCCGACGT. GGACGTGGTG 

50 63541 AGGCACAGGC CATCATCGCG 
63601 CGGTCAAGTC GAACATCGGA 
63661 TGGTCATGGC GATGCGCCAC 
63721 CGCATGTGGA CTGGACCGAG 
63781 ACGCGGGACG CCCGCGCCGC 

55 63841 ACGTGATCCT TGAGGGTGTT 
63901 TGCCGTTGCC GGTGTCGGCT 
63961 AGGGGTATCT GCGCGGGAGT 
64021 GTGCTGTCTT CGGTCACCGT 
64081 TGGATCAGCC GCGTACGGTG 

60 64141 GTGTGGAGTT GATGGACCGT 
64201 CGTTGTTGCC GCACACGGGC 
64261 AGCGGGTGGA GGTGGTCCAG 
64321 GGCAGGCCCA CGGGGTCGTA 
64381 CGGCGTGCGT GGCCGGGGCC 
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CGGACGACCG TACAGACCTG GGTCGACGAG CCGGCGGACG 
GTGCACAeee GGAGGGGGGA CGCGCCGTGG acgctgcacg 
CATGGCACGG CCCTGCCCGA TGCGGCCGAC GCCGAGVGGC 

GCGGACGGGC tgccgggtgt gtggcgccgg ggggaccagg 

GACGGACCGG ACGGTTTCGT GGTGCACCCC GACCTGCTCG 

GGCGACGGAA gccgccagcc ggccggatgg cgcgacctga 

ACCGTACTGC GCGCCTGCCT CACCCGGCGC ACCGACGGAG 
GACGGCGCCG GCCTGCCGGT actcaccgcg GAGGCGGTGA 
CCGTCCGGCT CCGAGGAGTC GGACGGCCTG CACCGGTTGG 
GCGGTCTACG ACGGTGACCT GCCCGAGGGA CATGTCCTGA 
GACCCCGAGG ACATACCCAC CCGCGCCCAC ACCCGCGCCA 
CAACACCACC TCACCACCAC CGACCACACC CTCATCGTCC 
GGCGCCACCG TCACCGGCCT CACCCGCACC GCCCAGAACG 
CTCATCGAAA CCGACCACCC CCACACCCCC CTCCCCCTGG 
CACCCCCACC TCCGCCTCAC CCACCACACC CTCCACCACC 
ACCACCACCC CACCCACCAC CACCCCCCTC AACCCCGAAC 
GGCTCCGGCA CCCTCGCCGG CATCCTCGCC CGCCACCTGA 
CTCTCCCGCA CGCCACCCCC CGACGCCACC CCCGGCACCC 
GACCCCCACC AACTCGCCAC CACCCTCACC CACATCCCCC 
CACACCGCCG CCACCCTCGA CGACGGCATC CTCCACGCCC 
ACCGTCCTCC ACCCCAAAGC CAACGCCGCC TGGCACCTGC 
CCCCTCACCC ACTTCGTCCT CTACTCCAGC GCCGCCJCCG 
GGAAACTACG CCGCCGCCAA CGCCTTCCTC GACGCCCTCG 
GGCCAACCCG CCACCTCCAT CGCCTGGGGC ATGTGGCACA 
CAACTCGACG ACGCCGACCG GGACCGCATC CGCCGCGGCG 
GACGAGGGCA TGCGCCTCTA CGAGGCGGCC GTCGGCTCCG 
GCCGCGATGG ACCCGGCACA GCCGATGACC GGCTCCGTAC 
CGCAGGAGCG CGCGGCGCGT CGCCCGTGCC GGGCAGACGT 
CTGCCCGACG CCGACCGCGG CGCGGCGCTG ACCACCCTCG 
GTGCTCGGCC ACGCCGACGC CTCCGAGATC GCGCCGACCA 
ATCGACTCGC TCACCGCGAT CGAGCTGCGC AACCGGCTCG 
CTGAGTGCCA CGCTGGTGTT CGACCACCCG ACACCTCGGG 
ACCGATCTGT TCGGCACGGC CGTGCCCACG CCCGCGCGGA 
GAGCCACTCG CGATCGTCGG CATGGCGTGC CGACTGCCCG 
GACCTGTGGC AGCTCGTGGC GTCCGGCACC GACGCGATCA 
GGCTGGGACA TCGACCGGCT GTTCGACCCG GACCCGGACG 
CGGCACGGCG GCTTCCTCGC CGAGGCCGCC GGCTTCGATG 
CCGCGCGAGG CACGGGCCAT GGACCCGCAG CAGCGCGTCA 
GCGTTCGAGA ACGCGGGCAT CGTGCCGGAC ACGCTGCGCG 
ATGGGCGCGT TCTCCCATGG GTACGGCGCC GGCGTCGACC 
GCCACGCAGA ACAGCGTGCT CTCCGGCCGG TTGTCGVACT 
GCCGTCACCG TCGACACCGC CTGCTCGTCG TCGCTGGTCG 
GCGCTGCGGA CTGGAGAATG CTCGCTGGCG CTCGCCGGCG 
CCGCTGGGCT ACGTCGAGTT CTGCCGCCAG CGGGGACTCG 
GCCTTCGCGG AAGGCGCCGA CGGCACGAGC TTCTCGGAGG 
GAGCGGCTCT CCGACGCCGA GCGCAACGGA CACACCGTCC 
GCCGTCAACC AGGACGGCGC CTCCAACGGC ATCTCCGCAC 
CGCGTCATCC GCCAGGCCCT CGACAAGGCC GGGCTCGCCC 
GAGGCCCACG GCACCGGAAC CCCGCTGGGC GACCCGATCG 
ACCTACGGCC AGGACCGCGA CACACCGCTC TACCTCGGTT 
CACACCCAGA CCACCGCCGG TGTCGCCGGC GTCATCAAGA 
GGCATCGCGC CGAAGACACT GCACGTGGAC GAGCCGTCGT 
GGTGCGGTGG AACTGCTCAC CGAGGCGAGG CCGTGGCCCG 
GCGGGCGTGT CGTCGCTCGG TATCAGCGGT ACGAACGCCC 
CCCGGGCCGT CGCGTGTGGA GCCGTCTGTT GACGGGTTGG 
CGGAGTGAGG CGAGTCTGCG GGGGCAGGTG GAGCGGCTGG 
GTGGATGTGG CCGCGGTCGC GCAGGGGTTG GTGCGTGAGC 
GCGGTACTGC TGGGTGATGC CCGGGTGATG GGTGTGGCGG 
TTCGTCTTTC CCGGGCAGGG TGCTCAGTGG GTGGGCATGG 
TCTGCGGTGT TCGCGGCTCG TATGGAGGAG TGTGCGCGGG 
TGGGATGTGC GGGAGATGTT GGCGCGGCCG GATGTGGCGG 
CCGGCCAGCT GGGCGGTCGC GGTCAGCCTG GCCGCACTGT 
CCCGACGCGG TGATCGGACA CTCCCAGGGC GAGATCGCGG 
CTCAGCCTTG AGGACGGCGC CCGCGTGGTG GCCTTGCGCA 
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GCCAGGTCAT 
CCGGTGAGGT 
CAGTCGTGGC 
GCGTGCGAGT 
TCGAGGACGA 
GGTGGTCGAC 
GGAACCTGCG 
TCGTGGAGTG 
CGTCGTTGCG 
GGACCCTGGG 
TCGATCTGCC 
CCGACCTGTC 
CACTACCCGC 
CCTGGCTGGC 
AGCTGGTCAT 
AATCCCCCCT 
CtGACGAGGC 
GGACCCGGCA 
GTGTTGTCGG 
CCTCGGAGTT 
GAATGCGGGC 
ACCGTGCCGC 
AGAGCGGCAG 
CCTGGCACGG 
CGGGCCCGGA 
TCGACGCGCT 
GGGTCGGGTG 
TGACGCTGCG 
TTCTCGACGC 
CCGCCAAGGC 
TCCTCGTCGA 
CACTCGGCGA 
GGGCCACGCC 
CCGGTTCCCT 
CCGGCGAGGT 
CGCTCGGTGT 
AGACCGGCCC 
GCGCCTTCGG 
GGACGTTCCC 
TCGACCTGGC 
TCGGCGCGGC 
GCGCCGCGAA 
CCGCGTTCGC 
TCCTCGACGC 
CGGACATCCG 
TGCAGCGGAT 
CGGTCCACGC 
GTCACACCGG 
TCATCACCGG 
ACACCTACCT 
GCGACGTCGG 
CCGCCGTCTT 
ACCGCGTCGA 
CCCGCGACAC 
GCCCGGGGCA 
GCCGTGCGCA 
CGCTCACCGC 
CGTTGAGCGC 
TCGTCGTCGC 
GCGGTCTGGC 
AGCCCCTGGC 
AGGTCGTGCT 
CGGACCGTCC 
GGCTCGCGGC 



CGCGGCGCGA 
CGGTCTGGTC 
CGGCGAGCCG 
GCGTCGTATC 
ACTCGCTGAG 
CGTGGACAGC 
TCGCCCCGTC 
CAGCGCCCAT 
CACCGGTGAC 
CGCGGCAGTG 
CACCTACGCG 
CGCGGCCGGG 
CGACGACGGT 
TGATCACGCG 
CCGGGCCGGT 
CGTGGTGCCG 
CGGACGGCGG 
CGCCAGCGGC 
TGCGGAGCCG 
CTACTTGCGC 
TGCCTGGCGT 
CGACGCGGAC 
CCTGCTCATG 
CGTCCGGTTC 
CGGCCTCCGG 
CGTGACCCGG 
GGCCCCGGTG 
CGGCGACGAC 
GCTGCTCCGG 
GGCCGCAGGC 
AACGGACCCG 
GCCCCATGTG 
GTCCCTGACG 
CGACGACCTT 
GCGGATCGCG 
GGTCGCCGAT 
CGGTGTGCAC 
ACCGGTCGCG 
GCAGGCGGCG 
CGGGCTGCGC 
GGCCGTCCAG 
GCGCCATCTG 
CGACGCGTTC 
GTCCGTCGGC 
GCACGCCGTC 
CATCGTCGAG 
CTGGGACGTG 
CAAGCTGGTG 
CGGCTCCGGC 
GCTCTCCCGC 
CGACCCCCAC 
CCACACCGCC 
CACCGTCCTC 
CGACCTCGCC 
GGGCAACTAC 
AGGGCTGCCC 
GAAACTCACC 
CGCGGACGGC 
GACGACCGTC 
CGCGCACCGG 
CGTGCGTCTT 
CCGCCACGCG 
GTTCCGCGAG 
CGAGACGGGG 



CTGGCCGGGC 
GAGGGCGTGT 
TCGGCGGTGG 
GCCGTCGACT 
GTACTGAAGG 
GCCTGGGTGA 
GCGCTGGACG 
CCGGTGCTGC 
GGCGGCTGGG 
GACTGGGACA 
TTCGAGCGCC 
CTGACAGGGG 
GGTGTTGTTC 
GTGCGGGGCA 
GACGAGACCG 
GCGACCGCAG 
CGAGTGACCG 
ACCCTGACCC 
TTCTCGCAGT 
CTGGACGCGC 
GATGGTGACA 
GGTTTCGGCA 
CTGGAATCGG 
CACGCGACGG 
CTGCATGCCG 
TCCCCGGAAG 
CCCGTACCTG 
GCCGACCCGC 
GCCGACCGGC 
CTGGTCCGCA 
GGAGAGGTCC 
CGGCTGCGCG 
CTCCCGGACA 
GCCGTCGTCC 
GTACGCGCGG 
GCGCGTCCGC 
GACCTGGCGC 
ATCACCGACC 
TCCGTGATGA 
CCCGGCGAGA 
ATCGCGCGGC 
GTGGACCTGG 
CCGCCGGTCG 
CTGCTCGCGG 
CAGCAGCCGT 
CTGCTCGGCC 
CGGCAGGCGC 
CTGACGGTCC 
ACCCTCGCCG 
ACCCCACCCC 
CAACTCGCCA 
GGAACCCTCG 
AAACCCAAGG 
GCGTTCGTCG 
GTCGCGGCGA 
GCGCAGTCCC 
GACGCGGACC 
ATGCGGCTGT 
GACCTCACCC 
GCCGGGCCGG 
GCCGGGCGTA 
GCCGCGGTCC 
CTCGGTTTCG 
CTGCGGCTGC 
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GGGGAGCGAT 
GGATCGCGGC 
AGGACGTGGT 
ACGCCTCCCA 
GAGTTGCAGG 
CCGAGCCGGT 
CGGCGGTGGC 
TGCCGGCGAT 
AGCGATGGCT 
CGGTGGTCGA 
GGCGCTACTG 
CAGCACATCC 
TCACCGGCCG 
CGGTCCTGCT 
GTTGCGGGAT 
CCGTGGATCT 
TCCACGCCCG 
GCGACACCCC 
GGCCACCTGC 
TGGGCTACCG 
CCGTGTACGC 
TGCACCCGGC 
ACGGCGAGCA 
GCGCGACCAT 
CGGACAGCGG 
CGGACCTCGC 
CCGGGGCCGG 
TCGGGGAGAC 
CGGTGATCTT 
CCGCTCAGAA 
TGGACGGCGC 
ACGGCCTCTT 
CCGGGTCGTG 
CCACCGACGC 
CGGGCCTGAA 
TCGGCAGCGA 
CCGGCGACCG 
GGCGGCTGCT 
CCGCGTTCGC 
AGGTCCTGAT 
ATCTGGGCGC 
ACGGAGCGCA 
ATGTCGTGCT 
CGGGTGGCCG 
TCGACCTGAT 
TGTTCGCGCG 
GGGAGGCGTT 
CGCGGCCGCT 
GCATCCTCGC 
CCGACACCAC 
CCACCCTCGC 
ACGACGCCCT 
CCGACGCCGC 
TCTACTCCGC 
ACGCGTTCCT 
TCGCATGGGG 
GCCAGCGCAT 
TCGACGCGGC 
AGCTCGACGG 
CGCGCACGGT 
CCGCCGCCGA 
TCGCGTACGG 
ATTCGCTGAC 
CGACGACGCT 



GGCTTCGGTG 
GCGTAACGGC 
GACGCGGTAT 
CACGCCCCAC 
GAAGGCCGCG 
GGATGAGAGT 
GGAGCTGGAC 
GGAACAGGCC 
GACGGCGTTG 
ACCGGTGCCA 
GCTGGAAGCG 
CATGCTGGCC 
GATCTCGTTG 
GCCGGGCACG 
AGTGGATGAA 
GTCGGTGACC 
CACCGAAGGC 
CGACACCCCC 
CACTGCCGCG 
GTTCGGACCC 
CGAGGTCGCG 
GCTGCTCGAC 
GAGCGTGCAA 
GCTGCGGGTG 
GAACCGTCCC 
GCCCGCCGAT 
TCCGTCCGAC 
CCGGGACCTG 
CCAGGTGACC 
CGAGCAGCCC 
GAAGCGCGAC 
CGAGGCAGCC 
GCAGCTGCGG 
CCCGGACCGG 
CTTCCGGGAT 
GGCCGCGGGT 
GGTCCTGGGG 
CGGCCGGATG 
GACCGCGTGG 
CCACGCGGCG 
GGAGGTGTAC 
TCTGGCCGAT 
CAACTCGCTC 
GTTCATCGAG 
GGACGCCGGC 
CGACGTGCTG 
CGGCTGGATG 
GGATCCCGAG 
CCCCCACCTG 
CCCCGGCACC 
CCGCATCCCC 
GCTCGACAAC 
CTGGCACCTG 
GGTCGCCGGC 
CGACGCGCTC 
CATGTGGGCG 
CCGGCGCAGC 
GACGCGTACC 
CGCCGTCGCG 
CGCCCGCAAC 
GCAGCGGCGC 
GCTGGGCGAC 
CGCGGTCGAC 
GGTGTTCAGC 



GCATTGCCGG 
CCCGCCTCGA 
GAGACCGAAG 
GTGGAA^CCA 
TCGGTGGCGT 
TACTGGTACC 
GGGTCCGTGT 
CACACGGTGG 
GCGCAGGCGT 
GGGCGGCTGC 
GCCGGTGCCA 
GCCATCACGG 
CGCACGCATC 
GCCTTTGTGG 
CTGGTCATCG 
GTGGAAGGAG 
ACCGGCAGCT 
AACGCTTCCG 
GCCGTCGACA 
ATGTTCCGCG 
CTCCCCGAGG 
GCGGCCTTGC 
CTGCCG^TCT 
GCGGTCGTAC 
GTCGCGACGA 
CCGATGCTGC 
GCGGACGTGC 
ACCACCCGTG 
GGTGGCCTCG 
GGCCGCTTCT 
GCGATCGCGG 
CGGCTGATGC 
CCGTCCGCCA 
CCGCTCGCGG 
GTCACGGTCG 
GTCGTCCTGG 
ATGCTCGCGG 
CCGGACGGCT 
TACGGCCTGG 
GCGACCGGTG 
{5CGACCACCA 
TCCCGCAGCA 
ACCGGTGAAT 
ATGGGGAAGA 
CCCGACCGGA 
CACCCGCTGC 
AGCAGCGGGC 
GGGGCCGTCG 
GGCCACCCCC 
CACCTCCCCT 
CAACCCCTCA 
CTCACCCCCG 
CACCCGCTCA 
CTCATGGGCA 
GCCGAACACC 
GACGTCAGCG 
GGATTCCCGC 
CCGGAACCGG 
CCGTTGCTCC 
GCCGGCGAAG 
ATCATGHAGG 
CGCGTGGCGG 
CTGCGCAATC 
CACCCGACGG 
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68281 CGGAGGCGCT CACCGCCCAC CTGCTCGACC TGATCGACGC 
68341 GGGAGTCCCT GCCCGCGGTG ACGGCCGCTC CCGTGGCGGC 
68401 CGATCGCCAT CGTGGCGATG GCGTGCCGGC TGCCCGGTGG 
68 4 61 TGTGGCGGCT CGTCGAGTCC GGCACCGACG CGATCACCAC 
5 68521 GGGACGTCGA CGCGCTGTAC GACGCGGACC CGGACGCGGC 
.68581 GGGGCGGTTA CCTGGCCGGG GCGGCGGAGT TCGACGCGGC 
68 641 GCGAAGCGCT CGGCATGGAC CCGCAGCAAC GCCTGCTGCT 
68701 TCGAGCGCGG CCGGATCAGT CCGGCGTCGC TCCGCGGCCG 
687 61 GTGCGGCCGC GCAGGGCTAC GGGCTGGGCG CCGAGGACAC 

10 68 821 GTGGTTCCAC GAGCCTGCTG TCCGGACGGC TGGCGTACGT 
68881 CGGTCACCGT GGACACGGCG TGCTCGTCGT CTCTGGTCGC 
68941 GGCTGCGCCT GGGCGAGTGC GAACTCGCTC TGGCCGGAGG 
69001 CGGCCGCGTT CGTGGAGTTC TCCCGCCAGC GCGGGCTCGC 
69061 CGTTCGGCGC GGGCGCGGAC GGCACGACGT GGTCCGAGGG 

15 69121 AACGGCTCTC CGACGCCGAG CGGCTCGGGC ACACCGTGCT 
69181 CCGTCACGTC CGACGGCGCC TCCAACGGCC TCACCGCGCC 
69241 GGGTCATCCG GAAGGCGCTC GCCGCGGCCG GGCTGACCGG 
69301 AGGGGCACGG CACCGGCACC CGGCTCGGCG ACCCGGTCGA 
69361 CGTACGGGCA GGACCGTCCG GCACCGGTCT GGCTGGGCTC 

20 69421 ATGCCACGGC CGCGGCCGGT GTCGCGGGCG TCATCAAGAT 
69481 GCACGATGCC GCGGACGCTG CATGTGGAGG AGCCCTCGCC 
69541 GACAGGTGTC CCTGCTCGGC TCCAACCGGC CCTGGCCGGA 
69601 CGGCCGTCTC CGCGTTCGGG CTCAGCGGGA CGAACGCGCA 
69661 GTCCGGCGCC CGTGGCGTCC CAGCCGCCCC GGCCGCCCCG 

25 69721 CGTGGGTGCT CTCCGCGCGG ACTCCGGCCG CGCTGCGGGC 
69781 ACCACCTCGC GGCGGCACCG GACGCGGATC CGTTGGACAT 
69841 GCCGCGCCCA GTTCGCCCAC CGTGCCGCGG TCGTCGCCAC 
69901 CCGCGCTCGA CGGCCTCGCG GACGGCGCGG AGGCGCCCGG 
69961 AGGAGCGGCG CGTCGCCTTC CTCTTCGACG GCCAGGGCGC 

30 7 0021 GCGAGCTCCA CCGCCGGTTC CCCGTCTTCG CCGCCGCGTG 
7 008i. TCGGCAAGCA CCTCAAGCAC TGCCCCACGG ACGTCTACCA 
7 0141 CCCATGACAC CCTGTACGCC CAGGCCGGCC TGTTCACGCT 
7 0201 TGCTGGAGCA CTGGGGGGTG CGGCCGGACG TGCTCGTCGG 
7 0261 CCGCGGCGTA CGCGGCGGGG GTGCTCACCC TGGCGGACGC 

35 7 0321 GGGGGCGGGC GCTGCGGGCG CTGCCGCCCG GGGCGATGCT 
7 0381 CGGAGGTCGG CGCCCGCACG GATCTGGACA TCGCCGCGGT 
70441 TGCTCGCCGG TTCGCCGGAC GATGTGGCGG CGTTCGAACG 
70501 GGCGCACGAA ACGGCTCGAC GTCGGGCACG CGTTCCACTC 
7 0561 TCGACGGCTT CCGTACGGTG CTGGAGTCGC TCGCGTTCGG 

40 7 0621 TGTCCACGAC GACGGGCCGG GACGCCGCGG ACGACCTCAT 
7 0681 GCCATGCGCG TCGGCCGGTG CTGTTCTCGG ATGCCGTCCG 
7 0741 TCACCACGTT CGTGGCCGTC GGCCCCTCCG GCTCCCTGGC 
70801- CCGGGGAGGA CGCCGGGACC TACCACGCGG TGCTGCGCGC 
70861 CGGCGCTGAC CGCCCTCGCC GAGCTGCACG CCCACGGCGT 

45 70921 TACTGGCCGG TGGCCGGCCA GTGGACCTTC CCGTGTACGC 
70981 GGCTGGCCCC GGCCGTGGCG GGGGCGCCGG CCACCGTGGC 
71041 AGTCCGAGCC GGAGGACCTC ACCGTCGCCG AGATCGTCCG 
71101 TCGGCGTCAC GGACCCCGCC GACGTCGATG CGGAAGCGAC 
71161 ACTCACTGGC GGTGCAGCGG CTGCGCAACC AGCTCGCCTC 

50 71221 CGGCGGCCGT CCTGTTCGAC CACGACACCC CGGCCGCGCT 
71281 GGATCGAGGC CGGCCAGGAC CGGATCGAGG CCGGCGAGGA 
71341 TCTCGCTCCT GGAGGAGATG GAGTCGCTCG ACGCCGCGGA 
71401 CGGAGCGTGC GGCCATCGCC GATCTGCTCG ACAAGCTCGC 
714 61 GATGAGCACC GATACGCACG AGGGAACGCC GCCCGCCGGC 

55 71521 GGACGGTCAC CGCGCCATCC TGGAGAGCGG CACGGTGGGT 
71581 CAAGCACTGG CTGGTCGCCG CCGCCGAGGA CGTCAAGCTG 
71641 CAGCTCGGCC GCGCCGTCCG AGATGCTGCC CGACCGGCGG 
71701 GGACTCACCG GAGCACAACC GCTACCGGCA GAAGATCGCG 
717 61 GGCGCGCAAG CGGGAGGACT TCGTCGCCGA GGCCGCCGAC 

60 71821 GGCCGCGGGA CCCGGCACCG ACCTCATCCC CGGGTACGCC 
71881 CATCAACGCG CTGTACGGGC TCACCCCTGA GGAGGGGGCC 
71941 CGACATCACC GGCTCGGCCG ATCTGGACAG CGTCAAGACG 
72001 GCACGCGCTG CGGCTGGTCC GCGCGAAGCG TGACGAGCGG 
72061 GCTGGCCTCG GCCGACGACG GCGAGATCTC GCTCAGCGAC 
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TCCCACCGCC CGGATCGCCG 
CGCGCGGGAC CAGGACGAGC 
TGTGACGTCG CCCGAGGACC 
GCCTCCTGAC GACCGCGGCT 
CGGCAAGGCG TACAACCTGC 
GTTCTTCGAC ATCAGTCCGC 
CGAAACGGCG TGGGAGGCGA 
GGAGGTCGGC GTCTATGTCG 
CGAGGGCCAC GCGATCACCG 
GCTCGGGCTG GAGGGCCCGG 
GCTGCATCTG GCGTGCCAGG 
GGTCTCCGTA CTGAGTTCGC 
GGCCGACGGG CGCTGCAAGT 
CGTGGGCGTG CTCGTACTGG 
CGCCGTCGTC CGCGGCAGCG 
GAACGGGCTC TCGCAGCAGC 
CGCCGACGTG GACGTCGTCG 
GGCGGACGCG CTGCTCGCGA 
GCTGAAGTCG AACATCGGAC 
GGTGCAGGCG ATCGGCGCGG 
CGCCGTCGAC TGGAGCACCG 
CGACGAGCGT CCGCGCCGGG 
CGTCATCCTG GAACAGCACC 
TGAGGAGTCC CAGCCGCTGC 
CCAGGCGGCC CGGCTGCGCG 
CGGGTACGCG CTGGCCACCA 
CACCCCGGAC GGATTCCGTG 
AGTCGTCACC GGGACCGCTC 
CCAGCGCGCC GGAATGGGGC 
GGACGAGGTC TCCGACGCGT 
CGGCGAACAC GGCGCTCTCG 
CGAAGTGGCG CTGCTGCGGC 
GCACTCCGTC GGCGAGGTGA 
GACGGAGTTG ATCGTGGCCC . 
CGCCGTCGAC GGAAGCCCGG 
CAACGGCCCG TCCGCCGTGG 
GGAGTGGTCG GCGGCCGGGC 
CCGGCACGTC GACGGTGCGC 
CGCGGCGCGG CTGCCGGTGG 
AACGCCCGCG CACTGGCTGC 
GGAGCTGGCC GACCGCGGCG 
GTCGGCCGCG GCGGAGAGCG 
CCGGACCGGT GAGGAGACCG 
CCCGGTCGAC CTGGCCGCGG 
GTTCCAGCAC CGTTCCTACT 
GGACACCGGG GGTCCGGCGG 
TCGGCGCACC GCGGCGCTGC 
GTTCTTCGCG CTCGGTTTCG 
GGCAACCGGG CTGGACCTGC 
CACCGCGTTC CTCCAGGACC 
CGACGACGCG CCCACCGTGC 
CATCGCGGCG ACGCCGGCCC 
CCATACCTGG AAGGACTACC 
CGCTGCCCAT TCGCGATCCA 
TCGTTCGACC TGTTCGGCGT 
GTCACCAACG ATCCGCGGTT 
CCCGGCTGGT TCTCCGGGAT 
GGGGACTTCA CACTGCGCGC 
GCCTGCCTGG ACGACATCGA 
AAGCGGCTGC CCTCCCTCGT 
GTGCTGGAGG CACGGATGCG 
CTGACCGACG ACTTCTTCGG 
GGCGAGGACC TGCTGCACCG 
GACGAGGCGA CGGGCGTGTT 
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72121 CGCGACGCTG CTGTTCGCCG 
72181 CGCACTGCTC AGCCACCCCG 
72241 CAACGCGGTC GAGGAGATGC 
72301 CTGTGTCGAG GACGTCGATG 
5 72361 GCTCTACTCG ACGGCCAACC 
.72421 GACGCGCCCG CTGGAGGGCA 
72481 GCACATCGCC CGGGTGCTCA 
72541 CGTCCGGCTG GCCGGCGACG 
72601 GCTGCGGGTC ACCTGGGGGG 

10 72661 GGGACGACGG TCGCGCACAT 
72721 ACCCAGCGCT GCTACCTGCG 
72781 GTCGGCGCGA ACATCGGCAT 
7 2841 GTGCACGCCT TCGAGCCCGC 
72901 CACGGCATCC CGGGCCAGGC 

15 72961 ATGACCTTCT ATCCCGACGC 
73021 ACGGAGCTGT TGCGCACGCT 
73081 ATGCTCGCGC AACTGCCCGA 
73141 GACGTCATCG CGGAGCGCGG 
73201 AGCGAACGGC AGGTCTTCGC 

20 7 3261 GTCGCGGAGG TCCACGACAT 
7 3321 CATGGCTTCA CCGTGGTCGC 
7 3381 GTCGCCGCGC GGCGGGTGGC 
73441 GCGGCGGTGC GGACGGCGGC 
73501 CCCTTCACCC CCAGCTTGCG 

25 7 3561 ACGAACAGCT GGCTGGCGAT 
73621 CGCCGCTCCG CCTCGGTCAG 
73681 TCCGCGTCCG AGGACTCCCC 
73741 GCGAGGTGCC GTGCGCGGCG 
7 3801 CACGCTTCGC CCATGTCGGC 

30 7 3861 AGCAGATCGG CGGCCTCGTC 
73921 TGCACCCGCA GCGTCATCAC 
73981 ATGAGCCTCA GCCCCTCGTC 
74 041 ACCCGCCACA GGGCCAGGCC 
7 4101 TCCCGGAACG CGTTGTACGC 

35 7 4161 GCCCAGACCA TGTGCAGTCC 
74 221 AGCCACCGCT CCGCCCGGTC 
74281 AGCGGCAATG CGGCGGCCAT 
7 4341 CCGCATTCGA CGGCGGCGGT 
7 4 401 GCGTGGACCG CCTCGTCGGC 

40 74 4 61 CAGGACTGGA CGGCATCGGT 
74521 GTGGTCCGGT CCGTCGTGAC 
7 4 581 TGTTCGGACC AGCCGCGCAG 
74 641 ACGGCTCCGG AAAACGAGGC 
74701 TCGGCCGCGC CGGGATAGAT 

45 74761 CCCTGCTCGC TCGGGGCGGC 
74821 CGCCCGTCCA TCGCCAGCCA 
7 4881 TCCCGCGACG CGGTGAGCAG 
74 941 CGCTCGATGG CGGCGGTGTC 
75001 CGGTAGGCGA ACTCCAGGTA 

50 75061 CGCGCGGCGT CGGTGAACAG 
75121 TGGTGGCGGG CGAGCACCTT 
75181 TCGTGCAGGC CACGCCGCTC 
75241 GGGTGCGGGA ACCGCCCTTC 
75301 TCGACCGCCT CGGTGTCGAG 

55 75361 CCGAGCACGG CGGAAGCTCG 
75421 CCGAGGTAGG CGAGCCGGTA 
75481 GTCCGTGCCT CCCGGATGTC 
75541 GCCCGGAACG CCTGGGCCAC 
75601 AGTTCGGTGG TCTGCGCCTC 

60 75661 CTCAGCAGTG CCGCCCGGAA 
75721 ACGATGGCG A CACGGGCCCG 
75781 GGCGCGTCGG CGTGGTGCAC 
75841 GTCAGCACCG TGCGGGTGAG 
75901 TCGCACGATG CCGTCAGCCG 
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GCCACGACTC GGTGCAGCAG ATGGTCGGCT ACTGCCTCTA 
AGCAGCAGGC GGCGCTGCGC GCGGGCCCGG AGCTGGTCGA 
TCCGTTTCCT GCCCGTCAAC CAGATGGGCG TACCGCGCGT 
TGCGGGGCGT GCGCATCCGT GCGGGCGACA ACGTGATCCC 
GCGACCCCGA GGTGTTCCCG CAGCCCGACA CCTTCGATGT 
ACTTCGCGTT CGGCCACGGC ATTCACAAGT GTCCCGGCCA 
TCAAGGTCGC CTGCCTGCGG TTGTTCGAGC GTTTCCCGGA 
TGCCGATGAA CGAGGGGCTC GGGCTGTTCA GCCCGGCCGA 
CGGCATGAGT CACCCGGTGG AGACGTTGCG GTTGCCGAAC 
CAACGCGGGC GAGGCGCAGT TCCTCTACCG GGAGATCTTC 
CCACGGTGTC GACCTGCGCC CGGGGGACGT GGTGTTCGAC 
GTTCACGCTT TTCGCGCATC TGGAGTGTCC TGGTGTGACC 
GCCCGTGCCG TTCGCGGCGC TGCGGGCGAA CGTGACGCGG 
GGACCAGTGC GCGGTCTCCG ACAGCTCCGG CACCCGGAAG 
CACGCTGATG TCCGGTTTCC ACGCGGATGC CGC<;GCCCGG 
CGGCCTCAAC GGCGGCTACA CCGCCGAGGA CGTCGACACC 
CGTCAGGGAG GAGATCGAAA CCCCTGTGGT CCGGCTCTCC 
TATCGAGGCC ATCGGCCTGC TGAAGGTCGA CGTGGAGAAG 
CGGCCTCGAG GACACCGACT GGCCCCGTAT CC-GCCAGGTC 
CGACGGCGCG CTCGAGGAGG TCGTCACGCT GCTCCGCGGC 
CGAGCAGGAA CCGCTGTTCG CCGGCACGGG CATCCACCAG 
CGGCTGAGCG CCGTCGGGGC CGCGGCCGTC CGCACCGGCG 
TCAGCCGGCG TCGGACAGTT CCTTGGGCAG TTGCTGACGG 
GAACACGTTG GTGAGGTGCT GTTCCACCGT GCTGGAGGTG 
CTCCTTGTTG GTGCGCCCGA CCGCGGCGTG CGACGCCACC 
CGATGTGATC CGCTGCGCCG GCGTCACGTC CTGGGTGCCG 
ACCGAGCCGC CGGAGGAGCG GCACGGCTCC GCACTGGGTC 
GAACAGTCCC CGCGCACGGC TGTGCCGCCG GAGCATGCCG 
GAGGACGCGG GCCAGCTCGT ACTGGTCGCG GCACATGATG 
GAGCAGTTCG ATCCGCTTGG CCGGCGGACT GTAGGCCGCC 
CCGCGCCCGG GACCCCATCG GCCGGGACAG CTGCTCGGAG 
ACGGCCGCGG C CGAGCAGCA GAAGCGCTTC GGCGGCGTCG 
CGGCAGGTCG ACGGACCAGC GTCGCATCCG CTCCCCGCAG 
CGCCCGGTAC CGCCCGGCCG CGAGATGGTG TTGCCCACGG 
GAAGAGGCTG TCGGAGGTCT CCTCCGGCAA CGGCTCGGCG 
CAGGTCGCCC AGTCGGATCG CGGCGGCCAC GGTGCTGCTC 
CCCCCAGGAG GGCACGACCC GGGGGGCGAG CGCGGCCTCG 
CAGGTCGCCG CGGCGCAGCG CGGCCTCGGC GCGGAACCCC 
CGGGGTCCGC ATGTTGTCGT CACCGGCCAG CTTGTCGACC 
GTCCTCGGCG TAGAGCAGGG CCAGCAACGC CATCATGGTC 
CCGGGAGTGC TGGAGCACGT ACTCGGCTTT GGCCTCGGCC 
CGCGTTGCTC AGGGCCTTGT CGGCGACGGC GCGGTGCCGG 
GACCTCGTCC TCGGCCGGCG GATCGGCCGG ACGCGGCGGA 
CAGCGCGAGG GACAGGTCCG CGACGCGCAG GTGCGCCCGG 
GGAGCGCTGG GCCGCCAGGA CCTCGGCGGC CTCGCCCGGC 
GCAGGCGAGC GACACGGCGT GCTCGCTGGA GAGGAGCCGT 
CTCGGGCACA TGCCGGCCGG ATCTGGCGGG ATCGCAGAGC 
GACGCGCAGT GCGGCGTGGA CGGCGGGGTC GTCGGAGGCC 
GGTGACGGCC TCGTCGAGCT CGCCGCGCAG GTGGTGCTCG 
CCCGGCGACC TCGGCGCCGT GCACCCGGCC GGTACCCATC 
GCTGGCCACG CCGCGGTCCC GCAGCAGTTC CAGCGCCAGC 
CGCGGCGGAG AGGTCGTCGA GTACGACGGA GCGGGCCGCG 
CCGCAGCAGC CGCCCCTCGA CCAGCTGTTC GTGGGCCTGC 
GCCGGTCATC CGCTGGACGA GGGTGAGTTC GACACTCTCG 
GGCGACGCTC AGCGCGGCCG <3GCCGCAACG ATAGAGCGAC 
CGCCCGCCCC GCGACCACTT CCAGGCACCC TGAGGTCCGT 
GTCGATCAGG CCGTGGCCGA GGAGCAGGTT GCCGCCGGTC 
CACGTCGTCG TGCGCGTCCT GGCCGAGGTG CCGGGGCACG 
GGTGAGCGGG CGCAGCGCGA TCTCCTGGTA GTGGCGCAGA 
TTGGGAGTGG GCGGGCGTCG GCCGGAGCAG CTCGGTCAGC 
<3CTGATGCGG CGCGCGAGGT GGAGCAGGCA GCGCAGCGAC 
GTCGTCGATG CCGATCAGTA CGGGCCGCTC CGCGGCGAGC 
TTCGGTCCCC AGGCGGTTGT CGACGTCGGC CCGCAGGTTT 
GACCAGCTCC GGTGTCCGGG CGGCCAGCTC CGGCTGGTCG 
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75961 AGGAGCTGGC CGAGCATGCC GTACGGCAGG GCCCGCTCCT CCATGGAGCA CACCGCGCGA 
7 6021 AGGGTGACGA AGCCGGCCTT GGCCGCGGCG GCGTCGAGGA GTTCGGTCTT GCCGCAGGCG 
7 6081 ATCGGCCCGG TGACGGCGGC GACGACGCCC CGCCCGCCCC CCGCTCGGGT GAGCGCCCGG 
76141 TGGAGGGAAC CGAACTCGTC ATCGCGGGCG ATCAGGTCTG GGGGAGATAA GCGCGCTATC 
5 7 6201 ACGAATGGAA CTACCTCGCG ACCGTCGTGG AAACCCATAG GCATCACATG GCTTGTTGAT 
7 6261 CTGTACGGCT GTGATTCAGC CTGGCGGGAT GCTGTGCTAC AGATGGGAAG ATGTGATCTA 
7 6321 GGGCCGTGCC GTTCCCTCAG GAGCCGACCG CCCCCGGCGC CACCCGCCGT ACCCCCTGGG 
7 6381 CCACCAGCTC GGCGACCCGC TCCTGGTGGT CGACGAGGTA GAAGTGCCCG CCGGGGAAGA 
7 6441 CCTCCACGGT GGTCGGCGCG GTCGTGTGCC CGGCCCAGGC GTGGGCCTGC TCCACCGTCG 

10 7 6501 TCTTCGGATC GTCGTCACCG ATGCACACCG TGATCGGCGT CTCCAGCGGC GGCGCGGGCT 
7 6561 CCCACCGGTA CGTCTCCGCC GCGTAGTAGT CCGCCCGCAA CGGCGCCAGG ATCAGCGCGC 
7 6621 GCATTTCGTC GTCCGCCATC ACATCGGCGC TCGTCCCGCC GAGGCCGATG ACCGCCGCCA 
7 6681 GCAGCTCGTC GTCGGACGCG AGGTGGTCCT GGTCGGCGCG CGGCTGCGAC GGCGCCCGCC 
7 6741 GGCCCGAGAC GATCAGGTGC GCCACCGGGA GCCGCTGGGC CAGCTCGAAC GCGAGTGTCG 

15 7 6801 CGCCCATGCT GTGGCCGAAC AGCACCAGCG GACGGTCCAG CCCCGGCTTC AACGCCTCGG 
7 68 61 CCACGAGGCC GGCGAGAACA CGCAGGTCGC GCACCGCCTC CTCGTCGCGG CGGTCCTGGC 
7 6921 GGCCGGGGTA CTGCACGGCG TACACGTCCG CCACCGGGGC GAGCGCACGG GCCAGCGGAA 
7 6981 GGTAGAACGT CGCCGATCCG CCGGCGTGGG GCAGCAGCAC CACCCGTACC GGGGCCTCGG 
77041 GCGTGGGGAA GAACTGCCGC AGCCAGAGTT CCGAGCTCAC CGCACCCCCT CGGCCGCGAC 

20 77101 CTGGGGAGCC CGGAACCGGG TGATCTCGGC CAAGTGCTTC TCCCGCATCT CCGGGTCGGT 
77161 CACGCCCCAT CCCTCCTCCG GCGCCAGACA GAGGACGCCG ACTTTGCCGT TGTGCACATT 
77221 GCGATGCACA TCGCGCACCG CCGACCCGAC GTCGTCGAGC GGGTAGGTCA CCGACAGCGT 
7 7281 CGGGTGCACC ATCCCCTTGC AGATCAGGCG GTTCGCCTCC CACGCCTCAC GATAGTTCGC 
7 7 341 GAAGTGGGTA CCGATGATCC GCTTCACGGA CATCCACAGG TACCGATTGT CAAAGGCGTG 

25 77 4 01 CTCGTATCCC GAGGTTGACG CGCAGGTGAC GATCGTGCCA CCCCGACGTG TCACGT^GAC 
77 4 61 ACTCGCGCCG AACGTCGCGC GCCCCGGGTG CTCGAACACG ATGTCGGGAT CGTCACCGCC 
77521 GGTCAGCTCC CGGATC 

Those of skill in the art will recognize that, due to the degenerate nature of the 

30 genetic code, a variety of DNA compounds differing in their nucleotide sequences can be 
used to encode a given amino acid sequence of the invention. The native DNA sequence 
encoding the FK-520 PKS of Streptomyces hygroscopicus is shown herein merely to 
illustrate a preferred embodiment of the invention, and the present invention includes 
DNA compounds of any sequence that encode the amino acid sequences of the 

35 polypeptides and proteins of the invention. In similar fashion, a polypeptide can typically 
tolerate one or more amino acid substitutions, deletions, and insertions in its amino acid 
sequence without loss or significant loss of a desired activity. The present invention 
includes such polypeptides with alternate amino acid sequences, and the amino acid 
sequences shown merely illustrate preferred embodiments of the invention. 

40 The recombinant nucleic acids, proteins, and peptides of the invention are many 

and diverse. To facilitate an understanding of the invention and the diverse compounds 
and methods provided thereby, the following general description of the FK-520 PKS 
genes and modules of the PKS proteins encoded thereby is provided. This general 
description is followed by a more detailed description of the various domains and 

45 modules of the FK-520 PKS contained in and encoded by the compounds of the 

invention. In this description, reference to a heterologous PKS refers to any PKS other 
than the FK-520 PKS. Unless otherwise indicated, reference to a PKS includes reference 
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to a portion of a PKS. Moreover, reference to a domain, module, or PKS includes 
reference to the nucleic acids encoding the same and vice-versa, because the methods 
and reagents of the invention provide or enable one to prepare proteins and the nucleic 
acids that encode them. 
5 The FK-520 PKS is composed of three proteins encoded by three genes 

designatedjkbA,flcbB 9 and JkbC. The JkbA ORF encodes extender modules 7-10 of the 
PKS. The fkbB ORF encodes the loading module (the Co A ligase) and extender modules 
1-4 of the PKS. The JkbC ORF encodes extender modules 5 - 6 of the PKS. The JkbP 
ORF encodes the NRPS that attaches the pipecolic acid and cyclizes the FK-520 
10 polyketide. 

The loading module of the FK-520 PKS includes a CoA ligase, an ER domain, 
and an ACP domain. The starter building block or unit for FK-520 is believed to be a 
dihydroxycyclohexene carboxylic acid, which is derived from shikimate. The 
recombinant DNA compounds of the invention that encode the loading module of the. 

15 FK-520 PKS and the corresponding polypeptides encoded thereby are useful for a 
variety of methods and in a variety of compounds. In one embodiment, a DNA 
compound comprising a sequence that encodes the FK-520 loading module is inserted 
into a DNA compound that comprises the coding sequence for a heterologous PKS. The 
resulting construct, in which the coding sequence for the loading module of the 

20 heterologous PKS is replaced by the coding sequence for the FK-520 loading module, 
provides a novel PKS coding sequence. Examples of heterologous PKS coding 
sequences include the rapamycin, FK-506, rifamycin, and avermectin PKS coding 
sequences. In another embodiment, a DNA compound comprising a sequence that 
encodes the FK-520 loading module is inserted into a DNA compound that comprises the 

25 coding sequence for the FK-520 PKS or a recombinant FK-520 PKS that produces ?n 
FK-520 derivative. 

In another embodiment, a portion of the loading module coding sequence is 
utilized in conjunction with a heterologous coding sequence. In this embodiment, the 
invention provides, for example, either replacing the CoA ligase with a different CoA 

30 ligase, deleting the ER, or replacing the ER with a different ER. In addition, or 

alternatively, the ACP can be replaced by another ACP. In similar fashion, the „ : 
corresponding domains in another loading or extender module can be replaced by one or 
more domains of the FK-520 PKS. The resulting heterologous loading module coding 
sequence can be utilized in conjunction with a coding sequence for a PKS that 

35 synthesizes FK-520, an FK-520 derivative, or another polyketide. 
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The first extender module of the FK-520 PKS includes a KS domain, an AT 
domain specific for methylmalonyl CoA, a DH domain, a KR domain, and an ACP 
domain. The recombinant DNA compounds of the invention that encode the first 
extender module of the FK-520 PKS and the corresponding polypeptides encoded 
5 thereby are useful for a variety of applications. In one embodiment, a DNA compound 
comprising a sequence that encodes the FK-520 first extender module is inserted into a 
DNA compound that comprises the coding sequence for a heterologous PKS. The 
resulting construct, in which the coding sequence for a module of the heterologous PKS 
is either replaced by that for the first extender module of the FK-520 PKS or the latter is 

10 merely added to coding sequences for modules of the heterologous PKS, provides a 
novel PKS coding sequence. In another embodiment, a DNA compound comprising a 
sequence that encodes the first extender module of the FK-520 PKS is inserted into a 
DNA compound that comprises the remainder of the coding sequence for the FK-520 
PKS or a recombinant FK-520 PKS that produces an FK-520 derivative. 

15 In another embodiment, all or only a portion of the first extender module coding 

sequence is utilized in conjunction with other PKS coding sequences to create a hybrid 
module. In this embodiment, the invention provides, for example, either replacing the 
methylmalonyl Co A specific AT with a malonyl Co A, ethylmalonyl Co A, or 2- 
hydroxymalonyl CoA specific AT; deleting either the DH or KR or both; replacing the 

20 DH or KR or both with another DH or KR; and/or inserting an ER. In replacing or 

inserting KR, DH, and ER domains, it is often beneficial to replace the existing KR, DH, 
and ER domains with the complete set of domains desired from another module. Thus, if 
one desires to insert an ER domain, one may simply replace the existing KR and DH 
domains with a KR, DH, and ER set of domains from a module containing such 

25 domains. In addition, the KS and/or ACP can be replaced with another KS and/or ACP. 
In each of these replacements or insertions, the heterologous KS, AT, DH, KR, ER, or 
ACP coding sequence can originate from a coding sequence for another module of the 
FK-520 PKS, from a gene for a PKS that produces a polyketide other than FK-520, or 
from chemical synthesis. The resulting heterologous first extender module coding 

30 sequence can be utilized in conjunction with a coding sequence for a PKS that 

synthesizes FK-520, an FK-520 derivative, or another polyketide. In similar fashion, the 
corresponding domains in a module of a heterologous PKS can be replaced by one or 
more domains of the first extender module of the FK-520 PKS. 

In an illustrative embodiment of this aspect of the invention, the invention 

35 provides recombinant PKSs and recombinant DNA compounds and vectors that encode - 
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such PKSs in which the KS domain of the first extender module has been inactivated. 
Such constructs are especially useful when placed in translational reading frame with the 
remaining modules and domains of an FK-520 or FK-520 derivative PKS. The utility of 
these constructs is that host cells expressing, or cell free extracts containing, the PKS 
encoded thereby can be fed or supplied with N-acylcysteamine thioesters of novel 
precursor molecules to prepare FK-520 derivatives. See U.S. patent application Serial 
No. 60/1 17,384, filed 27 Jan. 1999, and PCT patent publication Nos. US97/02358 and 
US99/03986, each of which is incorporated herein by reference. 

The second extender module of the FK-520 PKS includes a KS, an AT specific 
for methylmalonyl CoA, a KR, an inactive DH, and an ACP. The recombinant DNA 
compounds of the invention that encode the second extender module of the FK-520 PKS 
and the corresponding polypeptides encoded thereby are useful for a variety of 
applications. In one embodiment, a DNA compound comprising a sequence that encodes 
the FK-520 second extender module is inserted into a DNA compound that comprises the 
coding sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the second 
extender module of the FK-520 PKS or the latter is merely added to coding sequences 
for the modules of the heterologous PKS, provides a novel PKS coding sequence. In 
another embodiment, a DNA compound comprising a sequence that encodes the second 
extender module of the FK-520 PKS is inserted into a DNA compound that comprises 
the coding sequence for the remainder of the FK-520 PKS or a recombinant FK-520 PKS 
that produces an FK-520 derivative. 

In another embodiment, all or a portion of the second extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a hybrid 
module. In this embodiment, the invention provides, for example, either replacing the 
methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 2- 
hydroxymalonyl CoA specific AT; denting the KR and/or the inactive DH; replacing the : 
KR with another KR; and/or inserting an active DH or an active DH and an ER. In 
addition, the KS and/or ACP can be replaced with another KS and/or ACP. In each of 
these replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding 
sequence can originate from a coding sequence for another module of the FK-520 PKS, 
from a coding sequence for a PKS that produces a polyketide other than FK-520, or from 
chemical synthesis. The resulting heterologous second extender module coding sequence 
can be utilized in conjunction with a coding sequence from a PICS that synthesizes FK- 
520, an FK-520 derivative, or another polyketide. In similar fashion, the corresponding ■ 
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domains in a module of a heterologous PKS can be replaced by one or more domains of 
the second extender module of the FK-520 PKS. 

The third extender module of the FK-520 PKS includes a KS, an AT specific for 
malonyl CoA, a KR, an inactive DH, and an ACP. The recombinant DNA compounds of 
5 the invention that encode the third extender module of the FK-520 PKS and the 

corresponding polypeptides encoded thereby are useful for a variety of applications. In 
one embodiment, a DNA compound comprising a sequence that encodes the FK-520 
third extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding sequence 

10 for a module of the heterologous PKS is either replaced by that for the third extender 
module of the FK-520 PKS or the latter is merely added to coding sequences for the 
modules of the heterologous PKS, provides a novel PKS coding sequence. In another 
embodiment, a DNA compound comprising a sequence that encodes the third extender 
module of the FK-520 PKS is inserted into a DNA compound that comprises the coding 

1 5 sequence for the remainder of the FK-520 PKS or a recombinant FK-520 PKS that 
produces an FK-520 derivative. 

In another embodiment, all or a portion of the third extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a hybrid 
module. In this embodiment, the invention provides, for example, either replacing the 

20 malonyl CoA specific AT with a methylmalonyl CoA, ethylmalonyl CoA, or 2- 

hydroxymalonyl CoA specific AT; deleting the KR and/or the inactive DH; replacing the 
KR with another KR; and/or inserting an active DH or an active DH and an ER. In 
addition, the KS and/or ACP can be replaced with another KS and/or ACP. In each of 
these replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding 

25 sequence can originate from a coding sequence for another module of the FK-520 PKS, 
from a coding sequence for a PKS that produces a polyketide other than FK-520, or from 
chemical synthesis. The resulting heterologous third extender module coding sequence 
can be utilized in conjunction with a coding sequence from a PKS that synthesizes FK- 
520, an FK-520 derivative, or another polyketide. In similar fashion, the corresponding 

30 domains in a module of a heterologous PKS can be replaced by one or more domains of 
the third extender module of the FK-520 PKS. 

The fourth extender module of the FK-520 PKS includes a KS, an AT that binds 
ethylmalonyl CoA, an inactive DH, and an ACP. The recombinant DNA compounds of 
the invention that encode the fourth extender module of the FK-520 PKS and the 

35 corresponding polypeptides encoded thereby are useful for a variety of applications. In 
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one embodiment, a DNA compound comprising a sequence that encodes the FK-520 
fourth extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding sequence 
for a module of the heterologous PKS is either replaced by that for the fourth extender 
module of the FK-520 PKS or the latter is merely added to coding sequences for the 
modules of the heterologous PKS, provides a novel PKS coding sequence. In another 
embodiment, a DNA compound comprising a sequence that encodes the fourth extender 
module of the FK-520 PKS is inserted into a DNA compound that comprises the 
remainder of the coding sequence for the FK-520 PKS or a recombinant FK-520 PKS 
that produces an FK-520 derivative. 

In another embodiment, a portion of the fourth extender module coding sequence 
is utilized in conjunction with other PKS coding sequences to create a hybrid module. In 
this embodiment, the invention provides, for example, either replacing the ethylmalonyl 
CoA specific AT with a malonyl CoA, methylmalonyl CoA, or 2-hydroxymalonyl CoA 
specific AT; and/or deleting the inactive DH, inserting a KR, a KR and an active DH, or 
a KR, an active DH, and an ER. In addition, the KS and/or ACP can be replaced with 
another KS and/or ACP. In each of these replacements or insertions, the heterologous 
KS, AT, DK, KR, ER, or ACP coding sequence can originate from a coding sequence for 
another module of the FK-520 PKS, a PKS for a polyketide other than FK-520, or from 
chemical synthesis. The resulting heterologous fourth extender module coding sequence 
can be utilized in conjunction with a coding sequence for a PKS that synthesizes FK-520, 
an FK-520 derivative, or another polyketide. In similar fashion, the corresponding 
domains in a module of a heterologous PKS can be replaced by one or more domains of 
the fourth extender module of the FK-520 PKS. 

As illustrative examples, the present invention provides recombinant genes, 
vectors, and host cells that result from the conversion of the FK-506 PKS to an FK-520 
PKS and vice-versa. In one embodiment, the invention provides a recombinant set of 
FK-506 PKS genes but in which the coding sequences for the fourth extender module or 
at least those for the AT domain in the fourth extender module have been replaced by 
those for the AT domain of the fourth extender module of the FK-520 PKS . This 
recombinant PKS can be used to produce FK-520 in recombinant host cells. In another 
embodiment, the invention provides a recombinant set of FK-520 PKS genes but in 
which the coding sequences for the fourth extender module or at least those for the AT 
domain in the fourth extender module have been replaced by those for the AT domain of 
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the fourth extender module of the FK-506 PKS. This recombinant PKS can be used to 
produce FK-506 in recombinant host cells. 

Other examples of hybrid PKS enzymes of the invention include those in which 
the AT domain of module 4 has been replaced with a malonyl specific AT domain to 
5 provide a PKS that produces 21-desethyl-FK520 or with a methylmalonyl specific AT 
domain to provide a PKS that produces 2 1 -desethyl-2 1 -methyl-FK520. Another hybrid 
PKS of the invention is prepared by replacing the AT and inactive KR domain of FK-520 
extender module 4 with a methylmalonyl specific AT and an active KR domain, such as, 
for example, from module 2 of the DEBS or oleandolide PKS enzymes, to produce 21- 

1 0 desethyl-2 1 -methyl-22-desoxo-22-hydroxy-FK520. The compounds produced by these 
hybrid PKS enzymes are neurotrophins. 

The fifth extender module of the FK-520 PKS includes a KS, an AT that binds 
methylmalonyl CoA, a DH, a KR, and an ACP. The recombinant DNA compounds of 
the invention that encode the fifth extender module of the FK-520 PKS and the 

15 corresponding polypeptides encoded thereby are useful for a variety of applications. In 
one embodiment, a DNA compound comprising a sequence that encodes the FK-520 
fifth extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding sequence 
for a module of the heterologous PKS is either replaced by that for the fifth extender 

20 module of the FK-520 PKS or the latter is merely added to coding sequences for the 
modules of the heterologous PKS, provides a novel PKS. In another embodiment, a 
DNA compound comprising a sequence that encodes the fifth extender module of the 
FK-520 PKS is inserted into a DNA compound that comprises the coding sequence for 
the FK-520 PKS or a recombinant FK-520 PKS that produces an FK-520 derivative. 

25 In another embodiment, a portion of the fifth extender module coding sequence is 

utilized in conjunction with other PKS coding sequences to create a hybrid module. In 
this embodiment, the invention provides, for example, either replacing the 
methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 2- 
hydroxymalonyl CoA specific AT; deleting any one or both of the DH and KR; replacing 

30 any one or both of the DH and KR with either a KR and/or DH; and/or inserting an ER. 
In addition, the KS and/or ACP can be replaced with another KS and/or ACP. In each of 
these replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding 
sequence can originate from a coding sequence for another module of the FK-520 PKS, 
from a coding sequence for a PKS that produces a polyketide other than FK-520, or from 

35 chemical synthesis. The resulting heterologous fifth extender module coding sequence 
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can be utilized in conjunction with a coding sequence for a PKS that synthesizes FK-520, 
an FK-520 derivative, or another polyketide. In similar fashion, the corresponding 
domains in a module of a heterologous PKS can be replaced by one or more domains of 
the fifth extender module of the FK-520 PKS. 
5 In an illustrative embodiment, the present invention provides a set of recombinant 

FK-520 PKS genes in which the coding sequences for the DH domain of the fifth 
extender module have been deleted or mutated to render the DH non- functional. In one 
such mutated gene, the KR and DH coding sequences are replaced with those encoding 
only a KR domain from another PKS gene. The resulting PKS genes code for the 

10 expression of an FK-520 PKS that produces an FK-520 analog that lacks the C- 19 to C- 
20 double bond of FK-520 and has a C-20 hydroxy 1 group. Such analogs are preferred 
neurotrophins, because they have little or no immunosuppressant activity. This 
recombinant fifth extender module coding sequence can be combined with other coding 
sequences to make additional compounds of the invention. In an illustrative embodiment, 

15 the present invention provides a recombinant FK-520 PKS that contains both this fifth 
extender module and the recombinant fourth extender module described above that 
comprises the coding sequence for the fourth extender module AT domain of the FK-506 
PKS. The invention also provides recombinant host cells derived from FK-506 
producing host cells that have been mutated to prevent production of FK-506 but that 

20 express this recombinant PKS and so synthesize the corresponding (lacking the C-19 to 
C-20 double bond of FK-506 and having a C-20 hydroxyl group) FK-506 derivative. In 
another embodiment, the present invention provides a recombinant FK-506 PKS in 
which the DH domain of module 5 has been deleted or otherwise rendered inactive and 
thus produces this novel polyketide. 

25 The sixth extender module of the FK-520 PKS includes a KS, an AT specific for 

methylmalonyl CoA, a KR, a DH, an ER, and an ACP. The recombinant DNA 
compounds of the invention that encode the sixth extender module of the FK-520 PKS 
and the corresponding polypeptides encoded thereby are useful for a variety of 
applications. In one embodiment, a DNA compound comprising a sequence that encodes 

30 the FK-520 sixth extender module is inserted into a DNA compound that comprises the 
coding sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the sixth 
extender module of the FK-520 PKS or the latter is merely added to coding sequences 
for the modules of the heterologous PKS, provides a novel PKS coding sequence. In 

35 another embodiment, a DNA compound comprising a sequence that encodes the sixth 
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extender module of the FK-520 PKS is inserted into a DNA compound that comprises 
the coding sequence for the remainder of the FK-520 PKS or a recombinant FK-520 PKS 
that produces an FK-520 derivative. 

In another embodiment, a portion of the sixth extender module coding sequence 
is utilized in conjunction with other PKS coding sequences to create a hybrid module. In 
this embodiment, the invention provides, for example, either replacing the 
methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 2- 
hydroxymalonyl CoA specific AT; deleting any one, two, or all three of the KR, DH, and 
ER; and/or replacing any one, two, or all three of the KR, DH, and ER with another KR, 
DH, and ER. In addition, the KS and/or ACP can be replaced with another KS and/or 
ACP. In each of these replacements, the heterologous KS, AT, DH, KR, ER, or ACP 
coding sequence can originate from a coding sequence for another module of the FK-520 
PKS, from a coding sequence for a PKS that produces a polyketide other than FK-520, or 
from chemical synthesis. The resulting heterologous sixth extender module coding 
sequence can be utilized in conjunction with a coding sequence for a PKS that 
synthesizes FK-520, an FK-520 derivative, or another polyketide. In similar fashion, the 
corresponding domains in a module of a heterologous PKS can be replaced by one or 
more domains of the sixth extender module of the FK-520 PKS. 

In an illustrative embodiment, the present invention provides a set of recombinant 
FK-520 PKS genes in which the coding sequences for the DH and ER domains of the 
sixth extender module have been deleted or mutated to render them non-functional. In 
one such mutated gene, the KR, ER, and DH coding sequences are replaced with those 
encoding only a KR domain from another PKS gene. This can also be accomplished by 
simply replacing the coding sequences for extender module six with those for an 
extender module having a methylmalonyl specific AT and only a KR domain from a 
heterologous PKS gene, such as, for example, the coding sequences for extender module 
two encoded by the eryAI gene. The resulting PKS genes code for the expression of an 
FK-520 PKS that produces an FK-520 analog that has a C- 18 hydroxyl group. Such 
analogs are preferred neurotrophins, because they have little or no immunosuppressant 
activity. This recombinant sixth extender module coding sequence can be combined with 
other coding sequences to make additional compounds of the invention. In an illustrative 
embodiment, the present invention provides a recombinant FK-520 PKS that contains 
both this sixth extender module and the recombinant fourth extender module described 
above that comprises the coding sequence for the fourth extender module AT domain of 
the FK-506 PKS. The invention also provides recombinant host cells derived from FK- 
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506 producing host cells that have been mutated to prevent production of FK-506 but 
that express this recombinant PKS and so synthesize the corresponding (having a C-18 
hydroxyl group) FK-506 derivative. In another embodiment, the present invention 
provides a recombinant FK-506 PKS in which the DH and ER domains of module 6 have 
5 been deleted or otherwise rendered inactive and thus produces this novel polyketide. 

The seventh extender module of the FK-520 PKS includes a KS, an AT specific 
for 2-hydroxymalonyl CoA, a KR, a DH, an ER, and an ACP. The recombinant DNA 
compounds of the invention that encode the seventh extender module of the FK-52C PKS 
and the corresponding polypeptides encoded thereby are useful for a variety of 

10 applications. In one embodiment, a DNA compound comprising a sequence that encodes 
the FK-520 seventh extender module is inserted into a DNA compound that comprises 
the coding sequence for a heterologous PKS. The resulting construct, in which the 
coding sequence for a module of the heterologous PKS is either replaced by that for the 
seventh extender module of the FK-520 PKS or the latter is merely added to coding 

1 5 sequences for the modules of the heterologous PKS, provides a novel PKS coding 
sequence. In another embodiment, a DNA compound comprising a sequence that 
encodes the seventh extender module of the FK-520 PKS is inserted into a DNA 
compound that comprises the coding sequence for the remainder of the FK-520 PKS or a 
recombinant FK -520 PKS that produces an FK-520 derivative. 

20 In another embodiment, a portion or all of the seventh extender module coding 

sequence is utilized in conjunction with other PKS coding sequences to create a hybrid 
module. In this embodiment, the invention provides, for example, either replacing the 2- 
hydroxymalonyl CoA specific AT with a methylmalonyl CoA, ethylmalonyl CoA, or 
malonyl CoA specific AT; deleting the KR, the DH, and/or the ER; and/or replacing the 

25 KR, DH, and/or ER. In addition, the KS and/or ACP can be replaced with another KS 
and/or ACP. In each of these replacements or insertions, the heterologous KS, AT, DH, 
KR, ER, or ACP coding sequence can originate from a coding sequence for another 
module of the FK-520 PKS, from a coding sequence for a PKS that produces a 
polyketide other than FK-520, or from chemical synthesis. The resulting heterologous 

30 seventh extender module coding sequence can be utilized in conjunction with a coding 
sequence for a PKS that synthesizes FK-520, an FK-520 derivative, or another 
polyketide. In similar fashion, the corresponding domains in a module of a heterologous 
PKS can be replaced by one or more domains of the seventh extender module of the FK- 
520 PKS. 
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In an illustrative embodiment, the present invention provides a set of recombinant 
FK-520 PKS genes in which the coding sequences for the AT domain of the seventh 
extender module has been replaced with those encoding an AT domain for malonyl, 
methylmalonyl, or ethylmalonyl CoA from another PKS gene. The resulting PKS genes 
code for the expression of an FK-520 PKS that produces an FK-520 analog that lacks the 
C-15 methoxy group, having instead a hydrogen, methyl, or ethyl group at that position, 
respectively. Such analogs are preferred, because they are more slowly metabolized than 
FK-520. This recombinant seventh extender module coding sequence can be combined 
with other coding sequences to make additional compounds of the invention. In an 
illustrative embodiment, the present invention provides a recombinant FK-520 PKS that 
contains both this seventh extender module and the recombinant fourth extender module 
described above that comprises the coding sequence for the fourth extender module AT 
domain of the FK-506 PKS. The invention also provides recombinant host ceils derived 
from FK-506 producing host cells that have been mutated to prevent production of FK- 
506 but that express this recombinant PKS and so synthesize the corresponding (C-15- 
desmethoxy) FK-506 derivative. In another embodiment, the present invention provides 
a recombinant FK-506 PKS in which the AT domain of module 7 has been replaced and 
thus produces this novel polyketide. 

In another illustrative embodiment, the present invention provides a hybrid TKS 
in which the AT and KR domains of module 7 of the FK-520 PKS are replaced by a 
methylmalonyl specific AT domain and an inactive KR domain, such as, for example, 
the AT and KR domains of extender module 6 of the rapamycin PKS. The resulting 
hybrid PKS produces 15-desmethoxy-15-methyl-16-oxo-FK-520, a neurotrophin 
compound. 

The eighth extender module of the FK-520 PKS includes a KS, an AT specific 
for 2-hydroxymalonyl CoA, a KR, and an ACP. The recombinant DNA compounds of 
the invention that encode the eighth extender module of the FK-520 PKS and the 
corresponding polypeptides encoded thereby are useful for a variety of applications. In 
one embodiment, a DNA compound comprising a sequence that encodes the FK-520 
eighth extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding sequence 
for a module of the heterologous PKS is either replaced by that for the eighth extender 
module of the FK-520 PKS or the latter is merely added to coding sequences for the 
modules of the heterologous PKS, provides a novel PKS coding sequence. In another 
embodiment, a DNA compound comprising a sequence that encodes the eighth extender 
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module of the FK-520 PKS is inserted into a DNA compound that comprises the coding 
sequence for the remainder of the FK-520 PKS or a recombinant FK-520 PKS that 
produces an FK-520 derivative. 

In another embodiment, a portion of the eighth extender module coding sequence 
5 is utilized in conjunction with other PKS coding sequences to create a hybrid module. In 
this embodiment, the invention provides, for example, either replacing the 2- 
hydroxymalonyl CoA specific AT with a methylmalonyl CoA, ethylmalonyl CoA, or 
malonyl CoA specific AT; deleting or replacing the KR; and/or inserting a DH or a DH 
and an ER. In addition, the KS and/or ACP can be replaced with another KS and/or \CP. 

10 In each of these replacements, the heterologous KS, AT, DH, KR, ER, or ACP coding 
sequence can originate from a coding sequence for another module of the FK-520 PKS, 
from a coding sequence for a PKS that produces a polyketide other than FK-520, or from 
chemical synthesis. The resulting heterologous eighth extender module coding sequence 
can be utilized in conjunction with a PKS that synthesizes FK-520, an FK-520 

1 5 derivative, or another polyketide. In similar fashion, the corresponding domains in a 
module of a heterologous PKS can be replaced by one or more domains of the eighth 
extender module of the FK-520 PKS. 

In an illustrative embodiment, the present invention provides a set of recombinant 
FK-520 PKS genes in which the coding sequences for the AT domain of the eighth 

20 extender module has been replaced with those encoding an AT domain for malonyl, 

methylmalonyl, or ethylmalonyl CoA from another PKS gene. The resulting PKS genes 
code for the expression of an FK-520 PKS that produces an FK-520 analog that lacks the 
C-13 methoxy group, having instead a hydrogen, methyl, or ethyl group at that position, 
respectively. Such analogs are preferred, because they are more slowly metabolized than 

25 FK-520. This recombinant eighth extender module coding sequence can be combined 
with other coding sequences to make additional compounds of the invention. In an 
illustrative embodiment, the present invention provides a recombinant FK-520 PKS that 
contains both this eighth extender module and the recombinant fourth extender module 
described above that comprises the coding sequence for the fourth extender module AT 

30 domain of the FK-506 PKS. The invention also provides recombinant host cells deri/ed 
from FK-506 producing host cells that have been mutated to prevent production of FK- 
506 but that express this recombinant PKS and so synthesize the corresponding (C-13- 
desmethoxy) FK-506 derivative. In another embodiment, the present invention provides 
a recombinant FK-506 PKS in which the AT domain of module 8 has been replaced and 

35 thus produces this novel polyketide. 
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The ninth extender module of the FK-520 PKS includes a KS, an AT specific for 
methylmalonyl CoA, a KR, a DH, an ER, and an ACP. The recombinant DNA 
compounds of the invention that encode the ninth extender module of the FK-520 PKS 
and the corresponding polypeptides encoded thereby are useful for a variety of 
applications. In one embodiment, a DNA compound comprising a sequence that encodes 
the FK-520 ninth extender module is inserted into a DNA compound that comprises the 
coding sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the ninth 
extender module of the FK-520 PKS or the latter is merely added to coding sequences 
for the modules of the heterologous PKS, provides a novel PKS coding sequence. In 
another embodiment, a DNA compound comprising a sequence that encodes the ninth 
extender module of the FK-520 PKS is inserted into a DNA compound that comprises 
the coding sequence for the remainder of the FK-520 PKS or a recombinant FK-520 PKS 
that produces an FK-520 derivative. 

In another embodiment, a portion of the ninth extender module coding sequence 
is utilized in conjunction with other PKS coding sequences to create a hybrid module. In 
this embodiment, the invention provides, for example, either replacing the 
methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 2- 
hydroxymalonyl CoA specific AT; deleting any one, two, or all three of the KR, DH, and 
ER; and/or replacing any one, two, or all three of the KR, DH, and ER with another KR, 
DH, and/or ER. In addition, the KS and/or ACP can be replaced with another KS and/or 
ACP. In each of these replacements, the heterologous KS, AT, DH, KR, ER, or ACP 
coding sequence can originate from a coding sequence for another module of the FK-520 
PKS, from a coding sequence for a PKS that produces a polyketide other than FK-520, or 
from chemical synthesis. The resulting heterologous ninth extender module coding 
sequence can be utilized in conjunction with a PKS that synthesizes FK-520, an FK-520 
derivative, or another polyketide. In similar fashion, the corresponding domains in a 
module of a heterologous PKS. can be replaced by one or more domains of the ninth 
extender module of the FK-520 PKS. 

The tenth extender module of the FK-520 PKS includes a KS, an AT specific for 
malonyl CoA, and an ACP. The recombinant DNA compounds of the invention that 
encode the tenth extender module of the FK-520 PKS and the corresponding 
polypeptides encoded thereby are useful for a variety of applications. In one 
embodiment, a DNA compound comprising a sequence that encodes the FK-520 tenth 
extender module is inserted into a DNA compound that comprises the coding sequence 
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for a heterologous PKS. The resulting construct, in which the coding sequence for a 
module of the heterologous PKS is either replaced by that for the tenth extender module 
of the FK-520 PKS or the latter is merely added to coding sequences for the modules of 
the heterologous PKS, provides a novel PKS coding sequence. In another embodiment, a 
5 DNA compound comprising a sequence that encodes the tenth extender module of the 
FK-520 PKS is inserted into a DNA compound that comprises the coding sequence for 
the remainder of the FK-520 PKS or a recombinant FK-520 PKS that produces an FK- 
520 derivative. 

In another embodiment, a portion or all of the tenth extender module coding 

10 sequence is utilized in conjunction with other PKS coding sequences to create a hybrid 
module. In this embodiment, the invention provides, for example, either replacing the 
malpnyl CoA specific AT with a methylmalonyl CoA, ethylmalonyl CoA, or 2- 
hydroxymalonyl CoA specific AT; and/or inserting a KR, a KR and DH, or a KR, DH, 
and an ER. In addition, the KS and/or ACP can be replaced with another KS and/or ACP. 

15 In each of these replacements or insertions, the heterologous KS, AT, DH, KR, ER, or 
ACP coding sequence can originate from a coding sequence for another module of the 
FK-520 PKS, from a coding sequence for a PKS that produces a polyketide other than 
FK-520, or from chemical synthesis. The resulting heterologous tenth extender module 
coding sequence can be utilized in conjunction with a coding sequence for a PKS that 

20 synthesizes FK-520, an FK-520 derivative, or another polyketide. In similar fashion, the 
corresponding domains in a module of a heterologous PKS can be replaced by one or 
more domains of the tenth extender module of the FK-520 PKS. 

The FK-520 polyketide precursor produced by the action of the tenth extender 
module of the PKS is then attached to pipecolic acid and cyclized to form FK-520. The 

25 enzyme FkbP is the NRPS like enzyme that catalyzes these reactions. FkbP also includes 
a thioesterase activity that cleaves the nascent FK-520 polyketide from the NRPS. The 
present invention provides recombinant DNA compounds that encode the fkbP gene and 
so provides recombinant methods for expressing the fkbP gene product in recombinant 
host cells. The recombinant fkbP genes of the invention include those in which the 

30 coding sequence for the adenylation domain has been mutated or replaced with coding 
sequences from other NRPS like enzymes so that the resulting recombinant FkbP 
incorporates a moiety other than pipecolic acid. For the construction of host cells that do 
not naturally produce pipecolic acid, the present invention provides recombinant DNA 
compounds that express the enzymes that catalyze at least some of the biosynthesis of 

35 pipecolic acid (see Nielsen et aL> 1991, Biochem. 30: 5789-96). The flcbL gene encodes a 
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homolog o f RapL, a lysine eyelo deaminase responsible in part for producing the 
pipecolate unit added to the end of the polyketide chain. The JkbB and jkbL recombinant 
genes of the invention can be used in heterologous hosts to produce compounds such as 
FK-520 or, in conjunction with other PKS or NRPS genes, to produce known or novel 
polyketides and non-ribosmal peptides. 

The present invention also provides recombinant DNA compounds that encode 
the P450 oxidase and methyltransferase genes involved in the biosynthesis of FK-520. 
Figure 2 shows the various sites on the FK-520 polyketide core structure at which these 
enzymes act. By providing these genes in recombinant form, the present invention 
provides recombinant host cells that can produce FK-520. This is accomplished by 
introducing the recombinant PKS, P450 oxidase, and methyltransferase genes into a 
heterologous host cell. In a preferred embodiment, the heterologous host cell is 
Streptomyces coelicolor CH999 or Streptomyces lividans K4-1 14, as described in U.S. 
Patent No. 5,830,750 and U.S. patent application Serial Nos. 08/828,898, filed 31 Mar. 
1997, and 09/181,833, filed 28 Oct. 1998, each of which is incorporated herein by 
reference. In addition, by providing recombinant host cells that express only a subset of 
these genes, the present invention provides methods for making FK-520 precursor 
compounds not readily obtainable by other means. 

In a related aspect, the present invention provides recombinant DNA compounds 
and vectors that are useful in generating, by homologous recombination, recombinant 
host cells that produce FK-520 precursor compounds. In this aspect of the invention, a 
native host cell that produces FK-520 is transformed with a vector (such as an SCP2* 
derived vector for Streptomyces host cells) that encodes one or more disrupted genes 
(i.e., a hydroxylase, a methyltransferase, or both) or merely flanking regions from those 
genes. When the vector integrates by homologous recombination, the native, functional 
gene is deleted or replaced by the non-functional recombinant gene, and the resulting 
host cell thus produces an FK-520 precursor. Such host cells can also be complemented 
by introduction of a modified form of the deleted or mutated non-functional gene to 
produce a novel compound. 

In one important embodiment, the present invention provides a hybrid PKS and 
the corresponding recombinant DNA compounds that encode those hybrid PKS 
enzymes. For purposes of the present invention a hybrid PKS is a recombinant PKS that 
comprises all or part of one or more modules and thioesterase/cyclase domain of a first 
PKS and all or part of one or more modules, loading module, and thioesterase/cyclase 
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domain of a second PKS. In one preferred embodiment, the first PKS is all or part of the 
FK-520 PKS, and the second PKS is only a portion or all of a non-FK-520 PKS. 

One example of the preferred embodiment is an FK-520 PKS in which the AT 
domain of module 8, which specifies a hydroxymalonyl CoA and from which the C-13 
5 methoxy group of FK-520 is derived, is replaced by an AT domain that specifies a 
malonyl, methylmalonyl, or ethylmalonyl CoA. Examples of such replacement AT 
domains include the AT domains from modules 3, 12, and 13 of the rapaymycin PKS 
and from modules 1 and 2 of the erythromycin PKS. Such replacements, conducted at 
the level of the gene for the PKS, are illustrated in the examples below. Another 

10 illustrative example of such a hybrid PKS includes an FK-520 PKS in which the natural 
loading module has been replaced with a loading module of another PKS. Another 
example of such a hybrid PKS is an FK-520 PKS in which the AT domain of module 
three is replaced with an AT domain that binds methylmalonyl CoA. 

In another preferred embodiment, the first PKS is most but not all of a non-FK- 

1 5 520 PKS, and the second PKS is only a portion or all of the FK-520 PKS. An illustrative 
example of such a hybrid PKS includes an erythromycin PKS in which an AT specific 
for methylmalonyl CoA is replaced with an AT from the FK-520 PKS specfic for 
malonyl CoA. 

Those of skill in the art will recognize that all or part of either the first or second 
20 PKS in a hybrid PKS of the invention need not be isolated from a naturally occurring 
source. For example, only a small portion of an AT domain determines its specificity. 
See U.S. provisional patent application Serial No. 60/091,526, incorporated herein by 
reference. The state of the art in DNA synthesis allows the artisan to construct de novo 
DNA compounds of size sufficient to construct a useful portion of a PKS module or 
25 domain. For purposes of the present invention, such synthetic DNA compounds are 
deemed to be a portion of a PKS. 

Thus, the hybrid modules of the invention are incorporated into a PKS to provide 
a hybrid PKS of the invention. A hybrid PKS of the invention can result not only: 

(i) from fusions of heterologous domain (where heterologous means the domains 
30 in that module are from at least two different naturally occurring modules) coding 

sequences to produce a hybrid module coding sequence contained in a PKS gene whose 
product is incorporated into a PKS, 
but also: 

(ii) from fusions of heterologous module (where heterologous module means two 
35 modules are adjacent to one another that are not adjacent to one another in naturally 
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occurring PKS enzymes) coding sequences to produce a hybrid coding sequence 
contained in a PKS gene whose product is incorporated into a PKS, 

(iii) from expression of one or more FK-520 PKS genes with one or more non- 
FK-520 PKS genes, including both naturally occurring and recombinant non-FK-520 

5 PKS genes, and 

(iv) from combinations of the foregoing. 

Various hybrid PKSs of the invention illustrating these various alternatives are described 
herein. 

Examples of the production of a hybrid PKS by co-expression of PKS genes from 

1 0 the FK-520 PKS and another non-FK-520 PKS include hybrid PKS enzymes produced 
by coexpression of FK-520 and rapamycin PKS genes. Preferably, such hybrid PKS 
enzymes are produced in recombinant Streptomyces host cells that produce FK-520 or 
FK-506 but have been mutated to inactivate the gene whose function is to be replaced by 
the rapamycin PKS gene introduced to produce the hybrid PKS. Particular examples 

15 include (i) replacement of the JkbC gene with the rapB gene; and (ii) replacement of the 
JkbA gene with the rapC gene. The latter hybrid PKS produces 13,15-didesmethoxy-FK- 
520, if the host cell is an FK-520 producing host cell, and 13,15-didesmethoxy-FK-506, 
if the host cell is an FK-506 producing host cell. The compounds produced by these 
hybrid PKS enzymes are immunosuppressants and neurotrophins but can be readily 

20 modified to act only as neurotrophins, as described in Example 6, below. 

Other illustrative hybrid PKS enzymes of the invention are prepared by replacing 
the JkbA gene of an FK-520 or FK-506 producing host cell with a hybrid JkbA gene in 
which: (a) the extender module 8 through 10, inclusive, coding sequences have been 
replaced by the coding sequnces for extender modules 12 to 14, inclusive, of the 

25 rapamycin PKS; and (b) the module 8 coding sequences have been replaced by the 
module 8 coding sequence of the rifamycin PKS. When expressed with the other, 
naturally occurring FK-520 or FK-506 PKS genes and the genes of the modification 
enzymes, the resulting hybrid PKS enzymes produce, respectively, (a) 13-desmethoxy- 
FK-520 or 13-desmethoxy-FK-506; and (b) 13-desmethoxy-13-methyl-FK-520 or 13- 

30 desmethoxy- 1 3-methyl-FK-506. In a preferred embodiment, these recombinant PKS 
genes of the invention are introduced into the producing host cell by a vector such as 
pHU204, which is a plamsid pRM5 derivative that has the well-characterized SCP2* 
replicon, the colEl replicon, the tsr and bla resistance genes, and a co? site. This vector 
can be used to introduce the recombinant JkbA replacement gene in an FK-520 or FK- 

35 506 producing host cell (or a host cell derived therefrom in which the endogenous fkbA 
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gene has either been rendered inactive by mutation, deletion or homologous 
recombination with the gene that replaces it) to produce the desired hybrid PKS. 

In constructing hybrid PKSs of the invention, certain general methods may be 
helpful. For example, it is often beneficial to retain the framework of the module to be 
altered to make the hybrid PKS. Thus, if one desires to add DH and ER functionalities to 
a module, it is often preferred to replace the KR domain of the original module with a 
KR, DH, and ER domain-containing segment from another module, instead of merely 
inserting DH and ER domains. One can alter the stereochemical specificity of a module 
by replacement of the KS domain with a KS domain from a module that specifies a 
different stereochemistry. See Lau et al. 9 1999, "Dissecting the role of acyltransferase 
domains of modular polyketide synthases in the choice and stereochemical fate of 
extender units," Biochemistry 35(5):1643-1651, incorporated herein by reference. 
Stereochemistry can also be changed by changing the KR domain. Also, one can alter the 
specificity of an AT domain by changing only a small segment of the domain. See Lau et 
aLy supra. One can also take advantage of known linker regions in PKS proteins to link 
modules from two different PKSs to create a hybrid PKS. See Gokhale et al. % 16 Apr. 
1999, "Dissecting and Exploiting Intermodular Communication in Polyketide 
Synthases," Science 284: 482-485, incorporated herein by reference. 

The following Table lists references describing illustrative PKS genes and 
corresponding enzymes that can be utilized in the construction of the recombinant PKSs 
and the corresponding DNA compounds that encode them of the invention. Also 
presented are various references describing tailoring enzymes and corresponding genes 
that can be employed in accordance with the methods of the present invention. 
Avermectin 

U.S. Pat. No. 5,252,474 to Merck. 

MacNeil et aL, 1993, Industrial Microorganisms: Basic and Applied Molecular 
Genetics , Baltz, Hegeman, & Skatrud, eds. (ASM), pp. 245-256, A Comparison of the 
Genes Encoding the Polyketide Synthases for Avermectin, Erythromycin, and 
Nemadectin. 

MacNeil et aL 9 1992, Gene 115: 1 19-125, Complex Organization of the 
Streptomyces avermitilis genes encoding the avermectin polyketide synthase. 

Ikeda et al., Aug. 1999, Organization of the biosynthetic gene cluster for the 
polyketide anthelmintic macrolide avermectin in Streptomyces avermitilis* Proc. Natl 
Acad. Set USA 96: 9509-9514. 
Candicidin (FR008) 
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Hu et ah, 1994, Mol Microbiol 14: 163-172. 
Epothilone 

U.S. Pat. App. Serial No. 60/130,560, filed 22 April 1999. 
Erythromycin 

PCT Pub. No. 93/13663 to Abbott. 
US Pat. No. 5,824,513 to Abbott. 
Donadio et a/., 1991, Science 252:675-9. 

Cortes et aL 9 8 Nov. 1990, Nature 348:176s, An unusually large 
multifunctional polypeptide in the erythromycin producing polyketide synthase of 
Saccharopolyspora erythraea. 

Glycosylation Enzymes 

PCT Pat. App. Pub. No. 97/23630 to Abbott. 
FK-506 

Motamedi et aL, 1998, The biosynthetic gene cluster for the macrolactone ring of 
the immunosuppressant FK-506, Eur, J, biochem. 256: 528-534. 

Motamedi et aL 9 1997, Structural organization of a multifunctional polyketide 
synthase involved in the biosynthesis of the macrolide immunosuppressant FK-506, Eur. 
J. Biochem. 244: 74-80. 

Methyltransferase 

US 5,264,355, issued 23 Nov. 1993, Methylating enzyme from 
Streptomyces MA6858. 31-O-desmethyl-FK-506 methyltransferase. 

Motamedi et al., 1996, Characterization of methyltransferase and 
hydroxylase genes involved in the biosynthesis of the immunosuppressants FK-506 and 
FK-520, J. Bacteriol. 1 78: 5243-5248. 
Streptomyces hygroscopicus 

U.S. patent application Serial No. 09/154,083, filed 16 Sep. 1998. 
Lovastatin 

U.S. Pat No. 5,744,350 to Merck. 
Narbomycin 

U.S. patent application Serial No. 60/107,093, filed 5 Nov. 1998, and Serial No. 
60/120,254, filed 16 Feb. 1999. 
Nemadectin 

MacNeil et a/., 1993, supra, 
Niddamycin 
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Kakavas et aL> 1997, Identification and characterization of the niddamycin 
polyketide synthase genes from Streptomyces caelestis, J. BacterioL 179: 7515-7522. 
Oleandomycin 

Swan et al. 9 1994, Characterisation of a Streptomyces antibioticus gene encoding 
5 a type I polyketide synthase which has an unusual coding sequence, Mol Gen. Genet. 
242:358-362. 

U.S. patent application Serial No. 60/120,254, filed 16 Feb. 1999. 

Olano et al., 1998, Analysis of a Streptomyces antibioticus chromosomal region 
involved in oleandomycin biosynthesis, which encodes two glycosyltransferases 
10 responsible for glycosylation of the macrolactone ring, Mol Gen. Genet 259(3): 299- 
308. 

Picromycin 

PCT patent application US99/1 5047, filed 2 Jul. 1 999. 
Xue et aL 9 1998, Hydroxylation of macrolactones YC-17 and narbomycin is 
15 mediated by the /?/AOencoded cytochrome P450 in Streptomyces venezuelae. Chemistry 
& Biology 5(11): 661-667. 

Xue et al. 9 Oct. 1998, A gene cluster for macrolide antibiotic biosynthesis in 
Streptomyces venezuelae: Architecture of metabolic diversity, Proc. NatL Acad. Sci. 
USA 95: 12111 12116. 
20 Platenolide 

EP Pat. App. Pub. No. 791 ,656 to Lilly. 
Rapamycin 

Schwecke et aL 9 Aug. 1995, The biosynthetic gene cluster for the 
polyketide rapamycin, Proc. Natl Acad. Sci. USA 92:7839-7843. 
25 Aparicio et aL 9 1996, Organization of the biosynthetic gene cluster for rapamycin 

in Streptomyces hygroscopicus: analysis of the enzymatic domains in the modular 
polyketide synthase, Gene 169: 9-16. 
Rifamycin 

August et al. 9 13 Feb. 1998, Biosynthesis of the ansamycin antibiotic rifamycin: 
30 deductions from the molecular analysis of the rif biosynthetic gene cluster of 
Amycolatopsis mediterranei S669, Chemistry & Biology, 5(2): 69-79. 
Sorangium PKS 

U.S. patent application Serial No. 09/144,085, filed 31 Aug. 1998. 
Soraphen 

35 U.S. Pat. No. 5,716,849 to Novartis. 
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Schupp et aL y 1995, J. Bacteriology 177: 3673-3679. A Sorangium cellulosum 
(Mycobacterium) Gene Cluster for the Biosynthesis of the Macrolide Antibiotic 
Soraphen A: Cloning, Characterization, and Homology to Polyketide Synthase Genes 
from Actinomycetes. 
5 Spiramycin 

U.S. Pat. No. 5,098,837 to Lilly. 

Activator Gene 

U.S. Pat No. 5,514,544 to Lilly. 
Tylosin 

1 0 EP Pub. No. 791 ,655 to Lilly. 

U.S. Pat. No. 5,876,991 to Lilly. 

Kuhstoss et aL, 1996, Gene 755:231-6., Production of a novel polyketide 
through the construction of a hybrid polyketide synthase. 
Tailoring enzymes 

15 Merson-Davies and Cundliffe, 1994, Mol Microbiol 13: 349-355. Analysis of 

five tylosin biosynthetic genes from the tylBA region of the Streptomyces fradiae 
genome. 

As the above Table illustrates, there are a wide variety of polyketide synthase 
genes that serve as readily available sources of DNA and sequence information for use in 

20 constructing the hybrid PKS-encoding DNA compounds of the invention. Methods for 
constructing hybrid PKS-encoding DNA compounds are described without reference to 
the FK-520 PKS in PCT patent publication No. 98/51695; U.S. Patent Nos. 5,672,491 
and 5,712,146 and U.S. patent application Serial Nos. 09/073,538, filed 6 May 1998, and 
09/141,908, filed 28 Aug 1998, each of which is incorporated herein by reference. 

25 The hybrid PKS-encoding DNA compounds of the invention can be and often are 

hybrids of more than two PKS genes. Moreover, there are often two or more modules in 
the hybrid PKS in which all or part of the module is derived from a second (or third) 
PKS. Thus, as one illustrative example, the present invention provides a hybrid FK-520 
PKS that contains the naturally occurring loading module and FkbP as well as modules 

30 one, two, four, six, seven, and eight, nine, and ten of the FK-520 PKS and further 

contains hybrid or heterologous modules three and five. Hybrid or heterologous module 
three contains an AT domain that is specific of methylmalonyl CoA and can be derived 
for example, from the erythromycin or rapamycin PKS genes. Hybrid or heterologous 
module five contains an AT domain that is specific for malonyl CoA and can be derived 

35 for example, from the picromycin or rapamycin PKS genes. 
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While an important embodiment of the present invention relates to hybrid PKS 
enzymes and corresponding genes, the present invention also provides recombinant FK- 
520 PKS genes in which there is no second PKS gene sequence present but which differ 
from the FK-520 PKS gene by one or more deletions. The deletions can encompass one 
or more modules and/or can be limited to a partial deletion within one or more modules. 
When a deletion encompasses an entire module, the resulting FK-520 derivative is at 
least two carbons shorter than the gene from which it was derived. When a deletion is 
within a module, the deletion typically encompasses a KR, DH, or ER domain, or both 
DH and ER domains, or both KR and DH domains, or all three KR, DH, and ER 
domains. 

To construct a hybrid PKS or FK-520 derivative PKS gene of the invention, one 
can employ a technique, described in PCT Pub. No. 98/27203 and U.S. patent 
application Serial No. 08/989,332, filed 1 1 Dec. 1997, each of which is incorporated 
herein by reference, in which the large PKS gene is divided into two or more, typically 
three, segments, and each segment is placed on a separate expression vector. In this 
manner, each of the segments of the gene can be altered, and various altered segments 
can be combined in a single host cell to provide a recombinant PKS gene of the 
invention. This technique makes more efficient the construction of large libraries of 
recombinant PKS genes, vectors for expressing those genes, and host cells comprising 
those vectors. 

Thus, in one important embodiment, the recombinant DNA compounds of the 
invention are expression vectors. As used herein, the term expression vector refers to any 
nucleic acid that can be introduced into a host cell or cell-free transcription and 
translation medium. An expression vector can be maintained stably or transiently in a 
cell, whether as part of the chromosomal or other DNA in the cell or in any cellular 
compartment, such as a replicating vector in the cytoplasm. An expression vector also 
comprises a gene that serves to produce RNA that is translated into a polypeptide in the 
cell or cell extract. Furthermore, expression vectors typically contain additional 
functional elements, such as resistance-conferring genes to act as selectable markers. 

The various components of an expression vector can vary widely, depending on 
the intended use of the vector. In particular, the components depend on the host cell(s) in 
which the vector will be used or is intended to function. Vector components for 
expression and maintenance of vectors in E. coli are widely known and commercially 
available, as are vector components for other commonly used organisms, such as yeast 
cells and Streptomyces cells. 
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In a preferred embodiment, the expression vectors of the invention are used to 
construct recombinant Streptomyces host cells that express a recombinant PKS of the 
invention. Preferred Streptomyces host cell/vector combinations of the invention include 
iS. coelicolor CH999 and S. lividans K4-1 14 host cells, which do not produce 
5 actinorhodin, and expression vectors derived from the pRMl and pRM5 vectors, as 
described in U.S. Patent No. 5,830,750 and U.S. patent application Serial Nos. 
08/828,898, filed 31 Mar. 1997, and 09/181,833, filed 28 Oct. 1998, each of which is 
incorporated herein by reference. 

The present invention provides a wide variety of expression vectors for use in 

10 Streptomyces. For replicating vectors, the origin of replication can be, for example and 
without limitation, a low copy number vector, such as SCP2* (see Hopwood et aL, 
Genetic Manipulation of Streptomyces: A Laboratory manual (The John Innes 
Foundation, Norwich, U.K., 1985); Lydiate et aL 9 1985, Gene 35: 223-235; and Kieser 
and Melton, 1988, Gene 65: 83-91, each of which is incorporated herein by reference), 

15 SLP1.2 (Thompson et aL, 1982, Gene 20: 51-62, incorporated herein by reference), and 
SG5(ts) (Muth et al. 9 1989, Mol Gen. Genet 219: 341-348, and Bierman et al„ 1992, 
Gene 116: 43-49, each of which is incorporated herein by reference), or a high copy 
number vector, such as pIJlOl and pJVl (see Katz et al. 9 1983, 7. Gen. Microbiol. 129: 
2703-2714; Vara era/., 1989,./. Bacteriol 171: 5782-5781; and Servin-Gonzalez, 1993, 

20 Plasmid 30: 131-140, each of which is incorporated herein by reference). Generally, 

however, high copy number vectors are not preferred for expression of genes contained 
on large segments of DNA. For non-replicating and integrating vectors, it is useful to 
include at least an E. coli origin of replication, such as from pUC, plP, pi I, and pBR. 
For phage based vectors, the phages phiC3 1 and KC5 15 can be employed (see Hopwood 

25 et al., supra). 

Typically, the expression vector will comprise one or more marker genes by 
which host cells containing the vector can be identified and/or selected. Useful antibiotic 
resistance conferring genes for use in Streptomyces host cells include the ermE (confers 
resistance to eiythromycin and other macrolides and lincomycin), tsr (confers resistance 

30 to thiostrepton), aadA (confers resistance to spectinomycin and streptomycin), aacC4 
(confers resistance to apramycin, kanamycin, gentamicin, geneticin (G418), and 
neomycin), hyg (confers resistance to hygromycin), and vph (confers resistance to 
viomycin) resistance conferring genes. 

The recombinant PKS gene on the vector will be under the control of a promoter, 

35 typically with an attendant ribosome binding site sequence. The present invention 
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provides the endogenous promoters of the FK-520 PKS and related biosynthetic genes in 
recombinant form, and these promoters are preferred for use in the native hosts and in 
heterologous hosts in which the promoters function. A preferred promoter of the 
invention is the fkbO gene promoter, comprised in a sequence of about 270 bp between 
the start of the open reading frames of the fkbO and JkbB genes. The jkbO promoter is 
believed to be bi-directional in that it promotes transcription of the genes Jkb0 7 jkbP 9 and 
JkbA in one direction and fkbB.JkbC, and JkbL in the other. Thus, in one aspect, the 
present invention provides a recombinant expression vector comprising the promoter of 
the JkbO gene of an FK-520 producing organism positioned to transcribe a gene other 
XhznfkbO. In a preferred embodiment the transcribed gene is an FK-520 PKS gene. In 
another preferred embodiment, the transcribed gene is a gene that encodes a protein 
comprised in a hybrid PKS. 

Heterologous promoters can also be employed and are preferred for use in host 
cells in which the endogenous FK-520 PKS gene promoters do not function or function 
poorly. A preferred heterologous promoter is the actl promoter and its attendant activator 
gene actII~ORF4 9 which is provided in the pRMl and pRM5 expression vectors, supra. 
This promoter is activated in the stationary phase of growth when secondary metabolites 
are normally synthesized. Other useful Streptomyces promoters include without 
limitation those from the ermE gene and the melCl gene, which act constitutively, and 
the tipA gene and the merA gene, which can be induced at any growth stage. In addition, 
the T7 RNA polymerase system has been transferred to Streptomyces and can be 
employed in the vectors and host cells of the invention. In this system, the coding 
sequence for the T7 RNA polymerase is inserted into a neutral site of the chromosome or 
in a vector under the control of the inducible merA promoter, and the gene of interest is 
placed under the control of the T7 promoter. As noted above, one or more activator 
genes can also be employed to enhance the activity of a promoter. Activator genes in 
addition to the actII-ORF4 gene discussed above include cinrl, redD, and ptpA genes (see 
U.S. patent application Serial No. 09/181,833, supra) to activate promoters under their 
control. 

In addition to providing recombinant DNA compounds that encode the FK-520 
PKS, the present invention also provides DNA compounds that encode the ethylmalonyl 
Co A and 2-hydroxymalonyl Co A utilized in the synthesis of FK-520. Thus, the present 
invention also provides recombinant host cells that express the genes required for the 
biosynthesis of ethylmalonyl CoA and 2-hydroxymalonyl CoA. Figures 3 and 4 show the 
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location of these genes on the cosmids of the invention and the biosynthetic pathway that 
produces ethylmalonyl CoA. 

For 2-hydroxymalonyl CoA biosynthesis, the fkbH,jkbI,fkbJ, and fkbK genes are 
sufficient to confer this ability on Streptomcyces host cells. For conversion of 2- 
5 hydroxymalonyl to 2-methoxymalonyl, the fkbG gene is also employed. While the 
complete coding sequence for fkbH is provided on the cosmids of the invention, the 
sequence for this gene provided herein may be missing a T residue, based on a 
comparison made with a similar gene cloned from the ansamitocin gene cluster by Dr. H. 
Floss. Where the sequence herein shows one T, there may be two, resulting in an 
1 0 extension of the fkbH reading frame to encode the amino acid sequence: 

MTI\nECCLVWDLDNTLWRGTVLEDDEVVLTDEIREVITTLDDRGILQA 
DLAWERLERLGVAEYFVLAMGWGPKSQSVREIATELNFAPTTIAFro 
EVAFHLPEVRCYPAEQAATLLSLPEFSPPVSTVDSRRRRLMYQAGFARDQAREA 
YSGPDEDFLRSLDLSMTIAPAGEEELSRVEELTLRTSQMNATGVHYSDADLRAL 
15 LTDPAHEVLVVTMGDRFGPHGAVGIILLEKKPSTWHLKLLATSCRVVSFGAGAT 
ILNWLTDQGARAGAHLVADFRRTDRNRMMEIAYRFAGFADSDCPCVSEVAG^ 
AAGVERLHLEPSARPAPTTLTLTAADIAPVTVSAAG. 

For ethylmalonyl CoA biosynthesis, one requires only a crotonyl CoA reductase, 
which can be supplied by the host cell but can also be supplied by recombinant 
20 expression of the JkbS gene of the present invention. To increase yield of ethylmalonyl 
CoA, one can also express the fkbE and fkb U genes as well. While such production can 
be achieved using only the recombinant genes above, one can also achieve such 
production by placing into the recombinant host cell a large segment of the DNA 
provided by the cosmids of the invention. Thus, for 2-hydroxymalonyl and 2- 
25 methoxymalonyl CoA biosynthesis, one can simply provide the cells with the segment of 
DNA located on the left side of the FK-520 PKS genes shown in Figure 1, For 
ethylmalonyl CoA biosynthesis, one can simply provide the cells with the segment of 
DNA located on the right side of the FK-520 PKS genes shown in Figure 1 or, 
alternatively, both the right and left segments of DNA. 
30 The recombinant DNA expression vectors that encode these genes can be used to 

construct recombinant host cells that can make these important polyketide building 
blocks from cells that otherwise are unable to produce them. For example, Streptomyces 
coelicolor and Streptomyces lividans do not synthesisze ethylmalonyl CoA or 2- 
hydroxymalonyl CoA. The invention provides methods and vectors for constructing 
35 recombinant Streptomyces coelicolor and Streptomyces lividans that are able to 
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synthesize either or both ethylmalonyl CoA and 2-hydroxymalonyl CoA. These host 
cells are thus able to make polyketides, those requiring these substrates, that cannot 
otherwise be made in such cells. 

In a preferred embodiment, the present invention provides recombinant 
Streptomyces host cells, such as S. coelicolor and S. lividans> that have been transformed 
with a recombinant vector of the invention that codes for the expression of the 
ethylmalonyl CoA biosynthetic genes. The resulting host cells produce ethylmalonyl 
CoA and so are preferred host cells for the production of polyketides produced by PKS 
enzymes that comprise one or more AT domains specific for ethylmalonyl CoA. 
Illustrative PKS enzymes of this type include the FK-520 PKS and a recombinant PKS in 
which one or more AT domains is specific for ethylmalonyl CoA. 

In a related embodiment, the present invention provides Streptomyces host cells 
in which one or more of the ethylmalonyl or 2-hydroxymalonyl biosynthetic genes have 
been deleted by homologous recombination or rendered inactive by mutation. For 
example, deletion or inactivation of the JkbG gene can prevent formation of the methoxyl 
groups at C-13 and C-15 of FK-520 (or, in the corresponding FK-506 producing cell, 
FK-506), leading to the production of 1 3, 1 5-didesmethoxy- 1 3, 1 5-dihydroxy-FK-520 
(or, in the corresponding FK-506 producing cell, 13,15-didesmethoxy-13,15-dihydroxy- 
FK-506). If the fkbG gene product acts on 2-hydroxymalonyl and the resulting 2- 
methoxymalonyl substrate is required for incorporation by the PKS, the AT domains of 
modules 7 and 8 may bind malonyl CoA and methylmalonyl CoA. Such incorporation 
results in the production of a mixture of polyketides in which the methoxy groups at C- 
1 3 and C- 1 5 of FK-520 (or FK-506) are replaced by either hydrogen or methyl. 

This possibility of non-specific binding results from the construction of a hybrid 
PKS of the invention in which the AT domain of module 8 of the FK-520 PKS replaced 
the AT domain of module 6 of DEBS. The resulting PKS produced, in Streptomyces 
lividans, 6-dEB and 2-desmethyl-6-dEB, indicating that the AT domain of module 8 of 
the FK-520 PKS could bind malonyl CoA and methylmalonyl CoA substrates. Thus, one 
could possibly also prepare the 13,15-didesmethoxy-FK-520 and corresponding FK-506 
compounds of the invention by deleting or otherwise inactivating one or more or all of 
the genes required for 2-hydroxymalonyl CoA biosynthesis, i.e., the JkbH 9 fkbI,fkbJ, and 
JkbK genes. In any event, the deletion or inactivation of one or more biosynthetic genes 
required for ethylmalonyl and/or 2-hydroxymalonyl production prevents the formation of 
polyketides requiring ethylmalonyl and/or 2-hydroxymalonyl for biosynthesis, and the 



69 



WO 00/20601 PCT/US99/22886 . 

resulting host cells are thus preferred for production of polyketides that do not require 
the same. 

The host cells of the invention can be grown and fermented under conditions 
known in the art for other purposes to produce the compounds of the invention. See, e.g., 
5 U.S. Patent Nos. 5,194,378; 5,1 16,756; and 5,494,820, incorporated herein by reference, 
for suitable fermentation processes. The compounds of the invention can be isolated 
from the fermentation broths of these cultured cells and purified by standard procedures. 
Preferred compounds of the invention include the following compounds: 13- 
desmethoxy-FK-506; 13-desmethoxy-FK-520; 13,15-didesmethoxy-FK-506; 13,15- 
10 didesmethoxy-FK-520; 13-desmethoxy-18-hydroxy-FK-506; 13-desmethoxy-18- 

hydroxy-FK-520; 13,15-didesmethoxy-18-hydroxy-FK-506; and 13,15-didesmethoxy- 
18-hydroxy-FK-520. These compounds can be further modified as described for 
tacrolimus and FK-520 in U.S. Patent Nos. 5,225,403; 5,189,042; 5,164,495; 5,068,323; 
4,980,466; and 4,920,218, incorporated herein by reference. 
1 5 Other compounds of the invention are shown in Figure 8, Parts A and B. In 

Figure 8, Part A, illustrative C-32-substituted compounds of the invention are shown in 
two columns under the heading R. The substituted compounds are preferred for topical 
administration and are applied to the dermis for treatment of conditions such as psoriasis. 
In Figure 8, Part B, illustrative reaction schemes for making the compounds shown in 
20 Figure 8, Part A, are provided. In the upper scheme in Figure 8, Part B, the C-32 

substitution is a tetrazole moiety, illustrative of the groups shown in the left column 
under R in Figure 8, Part A. In the lower scheme in Figure 8, Part B, the C-32 
substitution is a disubstituted amino group, where R 3 and R4 can be any group similar to 
the illustrative groups shown attached to the amine in the right column under R in Figure 
25 8, Part A. While Figure 8 shows the C-32-substituted compounds in which the C-15- 
methoxy is present, the invention includes these C-32-substituted compounds in which 
C-15 is ethyl, methyl, or hydrogen. Also, while C-21 is shown as substituted with ethyl 
or allyl, the compounds of the invention includes the C-32-substituted compounds in 
which C-21 is substituted with hydrogen or methyl. 
30 To make these C-32-substituted compounds, Figure 8, Part B, provides 

illustrative reaction schemes. Thus, a selective reaction of the starting compound (see 
Figure 8, Part B, for an illustrative starting compound) with trifluoromethanesulfonic 
anhydride in the presence of a base yields the C-32 O-triflate derivative, as shown in the 
upper scheme of Figure 8, Part B. Displacement of the triflate with IH-tetrazole or 
35 triazole derivatives provides the C-32 tetrazole or teiazole derivative. As shown in the 

70 



3NSDOCID: <WO 0020601 A2 J_> 



WO 00/20601 PCT/US99/22886 

lower scheme of Figure 8, Part B, reacting the starting compound with p- 
nitrophenylchloroformate yields the correspoinding carbonate, which, upon displacement 
with an amino compound, provides the corresponding carbamate derivative. 

The compounds can be readily formulated to provide the pharmaceutical 
5 compositions of the invention. The pharmaceutical compositions of the invention can be 
used in the form of a pharmaceutical preparation, for example, in solid, semisolid, or 
liquid form. This preparation contains one or more of the compounds of the invention as 
an active ingredient in admixture with an organic or inorganic carrier or excipient 
suitable for external, enteral, or parenteral application. The active ingredient may be 
10 compounded, for example, with the usual non-toxic, pharmaceutical^ acceptable carriers 
for tablets, pellets, capsules, suppositories, solutions, emulsions, suspensions, and any 
other form suitable for use. Suitable formulation processes and compositions for the 
compounds of the present invention are described with respect to tacrolimus in U.S. 
Patent Nos. 5,939,427; 5,922,729; 5,385,907; 5,338,684; and 5,260,301, incorporated 
15 herein by reference. Many of the compounds of the invention contain one or more chiral 
centers, and all of the stereoisomers are included within the scope of the invention, zj> 
pure compounds as well as mixtures of stereoisomers. Thus the compounds of the 
invention may be supplied as a mixture of stereoisomers in any proportion. 

The carriers which can be used include water, glucose, lactose, gum acacia, 
20 gelatin, mannitol, starch paste, magnesium trisilicate, talc, com starch, keratin, colloidal 
silica, potato starch, urea, and other carriers suitable for use in manufacturing 
preparations, in solid, semi-solid, or liquified form. In addition, auxiliary stabilizing, 
thickening, and coloring agents and perfumes may be used. For example, the compounds 
of the invention may be utilized with hydroxypropyl methylcellulose essentially as 
25 described in U.S. Patent No. 4,91 6,1 38, incorporated herein by reference, or with a 

surfactant essentially as described in EPO patent publication No. 428,169, incorporated 
herein by reference. 

Oral dosage forms may be prepared essentially as described by Hondo et aL, 
1987, Transplantation Proceedings XIX, Supp. 6: 17-22, incorporated herein by 
30 reference. Dosage forms for external application may be prepared essentially as 

described in EPO patent publication No. 423,714, incorporated herein by reference. The 
active compound is included in the pharmaceutical composition in an amount sufficient 
to produce the desired effect upon the disease process or condition. 

For the treatment of conditions and diseases relating to immunosuppression or 
35 neuronal damage, a compound of the invention may be administered orally, topically, 
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parenterally, by inhalation spray, or rectally in dosage unit formulations containing 
conventional non-toxic pharmaceutically acceptable carriers, adjuvant, and vehicles. The 
term parenteral, as used herein, includes subcutaneous injections, and intravenous, 
intramuscular, and intrasternal injection or infusion techniques. 
5 Dosage levels of the compounds of the present invention are of the order from 

about 0.01 mg to about 50 mg per kilogram of body weight per day, preferably from 
about 0.1 mg to about 10 mg per kilogram of body weight per day. The dosage levels are 
useful in the treatment of the above-indicated conditions (from about 0.7 mg to about 3.5 
mg per patient per day, assuming a 70 kg patient). In addition, the compounds of the 
1 0 present invention may be administered on an intermittent basis, i.e., at semi-weekly, 
weekly, semi-monthly, or monthly intervals. 

The amount of active ingredient that may be combined with the carrier materials 
to produce a single dosage form will vary depending upon the host treated and the 
particular mode of administration. For example, a formulation intended for oral 
1 5 administration to humans may contain from 0.5 mg to 5 g of active agent compounded 
with an appropriate and convenient amount of carrier material, which may vary from 
about 5 percent to about 95 percent of the total composition. Dosage unit forms will 
generally contain from about 0.5 mg to about 500 mg of active ingredient. For external 
administration, the compounds of the invention can be formulated within the range of, 
20 for example, 0.00001% to 60% by weight, preferably from 0.001% to 10% by weight, 
and most preferably from about 0.005% to 0.8% by weight. The compounds and 
compositions of the invention are useful in treating disease conditions using doses and 
administration schedules as described for tacrolimus in U.S. Patent Nos. 5,542,436; 
5,365,948; 5,348,966; and 5,196,437, incorporated herein by reference. The compounds 
25 of the invention can be used as single therapeutic agents or in combination with other 
therapeutic agents. Drugs that can be usefully combined with compounds of the 
invention include one or more immunosuppressant agents such as rapamycin, 
cyclosporin A, FK-506, or one or more neurotrophic agents. 

It will be understood, however, that the specific dosage level for any particular 
30 patient will depend on a variety of factors. These factors include the activity of the 

specific compound employed; the age, body weight, general health, sex, and diet of the 
subject; the time and route of administration and the rate of excretion of the drug; 
whether a drug combination is employed in the treatment; and the severity of the 
particular disease or condition for which therapy is sought. 
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A detailed description of the invention having been provided above, the 
following examples are given for the purpose of illustrating the present invention and 
shall not be construed as being a limitation on the scope of the invention or claims. 

5 Example 1 

Replacement of Methoxyl with Hydroge n or Methyl at C-13 of FK-520 
The C-13 methoxyl group is introduced into FK-520 via an AT domain in 
extender module 8 of the PKS that is specific for hydroxymalonyl and by methylation of 
the hydroxyl group by an S-adenosyl methionine (SAM) dependent methyltransferase. 
1 0 Metabolism of FK-506 and FK-520 primarily involves oxidation at the C-13 position 
into an inactive derivative that is further degraded by host P450 and other enzymes. The 
present invention provides compounds related in structure to FK-506 and FK-520 that do 
not contain the C-13 methoxy group and exhibit greater stability and a longer half-life in 
vivo. These compounds are useful medicaments due to their immunosuppressive and 
15 neurotrophic activities, and the invention provides the compounds in purified form and 
as pharmaceutical compositions. 

The present invention also provides the novel PKS enzymes that produce these 
novel compounds as well as the expression vectors and host cells that produce the novel 
PKS enzymes. The novel PKS enzymes include, among others, those that contain an AT 
20 domain specific for either malonyl CoA or methylmalonyl CoA in module 8 of the FK- 
506 and FK-520 PKS. This example describes the construction of recombinant DNA 
compounds that encode the novel FK-520 PKS enzymes and the transformation of host 
cells with those recombinant DNA compounds to produce the novel PKS enzymes and 
the polyketides produced thereby. 
25 To construct an expression cassette for performing module 8 AT domain 

replacements in the FK-520 PKS, a 4.6 kb Sphl fragment from the FK-520 gene cluster 
was cloned into plasmid pLitmus 38 (a cloning vector available from New England 
Biolabs). The 4.6 kb Sphl fragment, which encodes the ACP domain of module 7 
followed by module 8 through the KR domain, was isolated from an agarose gel after 
30 digesting the cosmid pKOS65-C31 with Sph I. The clone having the insert oriented so 
the single Sad site was nearest to the Spel end of the polylinker was identified and 
designated as plasmid pKOS60-21-67. To generate appropriate cloning sites, two linkers 
were ligated sequentially as follows. First, a linker was ligated between the Spel and 
Sad sites to introduce aifeZII site at the 5' end of the cassette, to^liminate interfering 
35 polylinker sites, and to reduce the total insert size to 4.5 kb (the limit of the phage 
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KC515). The ligation reactions contained 5 picomolar unphosphorylated linker DNA and 
0. 1 picomolar vector DNA, i.e., a 50-fold molar excess of linker to vector. The linker had 

the following sequence: 

5 '-CTAGTGGGC AGATCTGGCAGCT-3 ' 
5 3'-ACCCGTCTAGACCG-5' 

The resulting plasmid was designated pKOS60-27-l. 

Next, a linker of the following sequence was ligated between the unique Sphl and 
Afia sites of plasmid pKOS60-27-l to introduce an Afril site at the 3' end of the module 
8 cassette. The linker employed was: 

10 5 * -GGG ATGC ATGGC-3 ' 

3 ' -GTACCCCT ACGT ACCG AATT-5 * 

The resulting plasmid was designated pKOS60-29-55. 

To allow in-frame insertions of alternative AT domains, sites were engineered at 
the 5' end (Avr II or Nhe I) and 3' end (Xho I) of the AT domain using the polymerase 
1 5 chain reaction (PCR) as follows. Plasmid pKOS60-29-55 was used as a template for the 
PCR and sequence 5* to the AT domain was amplified with the primers SpeBgl-fwd and 
either Avr-rev or Nhe-rev: 

SpeBgl-fwd 5 ' -CG ACTC ACT AGTGGGC AGATCTGG-3 ' 
Avr-rev 5'-CACGCCTAGGCCGGTCGGTCTCGGGCCAC-3* 
20 Nhe-rev 5 ' -GCGGCT AGCTGCTCGCCC ATCGCGGG ATGC-3 * 

The PCR included, in a 50 ul reaction, 5 ul of lOx PJu polymerase buffer 
(Stratagene), 5 ul lOx z-dNTP mixture (2 mM dATP, 2 mM dCTP, 2 mM dTTP, 1 mM 
dGTP, 1 mM 7-deaza-GTP), 5 ul DMSO, 2 ul of each primer (10 uM), 1 ul of template 
DNA (0.1 ug/ul), and 1 ul of cloned Pfu polymerase (Stratagene). The PCR conditions 
25 were 95°C for 2 min., 25 cycles at 95°C for 30 sec, 60°C for 30 sec, and 72°C for 4 
min., followed by 4 min. at 72°C and a hold at 0°C. The amplified DNA products and 
the Litmus vectors were cut with the appropriate restriction enzymes (flg/O and AvrU. or 
Spel and Nhel), and cloned into either pLitmus 28 or P Litmus38 (New England Biolabs), 
respectively, to generate the constructs designated pKOS60-37-4 and pKOS60-37-2, 

30 respectively. 

Plasmid pKOS60-29-55 was again used as a template for PCR to amplify 
sequence 3' to the AT domain using the primers BsrXho-fwd and NsiAfl-rev: 

BsrXho-fwd 5 ' -G ATGT AC AGCTCG AGTCGGC ACGCCCGGCCGC ATC-3 ' 
NsiAfl-rev 5'-CGACTCACTTAAGCCATGCATCC-3♦ 
35 PCR conditions were as described above. The PCR fragment was cut with BsrGl 

and A/m, gel isolated, and ligated into pKOS60-37-4 cut with Aspl\% and AflU. and 
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inserted into pKOS60-37-2 cut with BsrGl and AflR, to give the plasmids pKOS60-39-l 
and pKOS60-39-13, respectively. These two plasmids can be digested with AvrTl and 
Xhol or Nhel and Xhol, respectively, to insert heterologous AT domains specific for 
malonyl, methylmalonyl, ethylmalonyl, or other extender units. 
5 Malonyl and methylmalonyl-specific AT domains were cloned from the 

rapamycin cluster using PCR amplification with a pair of primers that introduce an ^vrll 
or Nhel site at the 5' end and wXhol site at the 3' end. The PCR conditions were as 
given above and the primer sequences were as follows: 

10 RATN1 5 ' -ATCCTAGGCGGGCRGG YGTGTCGTCCTTCGG-3 ' 

f 3 ' end of Rap KS sequence and universal for malonyl and methylmalonyl Co A), 
RATMN2 5'-ATGCTAGCCGCCGCGTTCCCCGTCTTCGCGCG-3- 
(Rap AT shorter version 5'- sequence and specific for malonyl CoA), 
RATMMN2 5'-ATGCTAGCGGATTCGTCGGTGGTGTTCGCCGA-3 

1 5 (Rap AT shorter version 5'- sequence and specific for methylmalonyl CoA), and 
RATC 5 ' - ATCTCGAGCCAGT ASCGCTGGTGYTGGAAGG-3 ' 
(Rap DH 5 sequence and universal for malonyl and methylmalonyl CoA). 
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Because of the high sequence similarity in each module of the rapamycin cluster, 
each primer was expected to prime any of the AT domains. PCR products representing 
ATs specific for malonyl or methylmalonyl extenders were identified by sequencing 
individual cloned PCR products. Sequencing also confirmed that the chosen clones 
contained no cloning artifacts. Examples of hybrid modules with the rapamycin AT12 
15 and ATI 3 domains are shown in a separate figure. 

The AvrH-Xhol restriction fragment that encodes module 8 of the FK-520 PKS 
with the endogenous AT domain replaced by the AT domain of module 12 of the 
rapamycin PKS has the DN A sequence and encodes the amino acid sequence shown 
below. The AT of rap module 12 is specific for incorporation of malonyl units. 

20 AGATCTGGCAGCTCGCCGAAGCGCTGCTGACGCTCGTCCGGGAGAGCACC 50 
I WQLAEALLTL VR EST 
GCCGCCGTGCTCGGCCACGTGGGTGGCGAGGACATCCCCGCGACGGCGGC 100 

AAVLGHVG GEDI pATAA , cn 
GTTCAAGGACCTCGGCATCGACTCGCTCACCGCGGTCCAGCTGCGCAACG 150 

FKDLG I DSLTAVQIjRN 
CCCTCACCGAGGCGACCGGTGTGCGGCTGAACGCCACGGCGGTCTTCGAC 200 
AI.TEATGVRLNATAVFD 
TTCCCGACCCCGCACGTGCTCGCCGGGAAGCTCGGCGACGAACTGACCGG 250 
FPTPHVLAGK LGDELT G 
30 CACCCGCGCGCCCGTCGTGCCCCGGACCGCGGCCACGGCCGGTGCGCACG 300 
T RAPVV pRTAATAGAH 
ACGAGCCGCTGGCGATCGTGGGAATGGCCTGCCGGCTGCCCGGCGGGGTC 350 
DE PLAIVG MACRLPGG. V 
GCGTCACCCGAGGAGCTGTGGCACCTCGTGGCATCCGGCACCGACGCCAT 400 
35 ASPEELWHLVASGTDAI 

CACGGAGTTCCCGACGGACCGCGGCTGGGACGTCGACGCGATCTACGACC 450 

TE F PTDRG WDV DAIYD 
CGGACCCCGACGCGATCGGCAAGACCTTCGTCCGGCACGGTGGCTTCCTC 500 
PDPDAIGKTFVRHGGFI. 
40 ACCGGCGCGACAGGCTTCGACGCGGCGTTCTTCGGCATCAGCCCGCGCGA 550 
TGATGFD AAFFGISPRE 
GGCCCTCGCGATGGACCCGCAGCAGCGGGTGCTCCTGGAGACGTCGTGGG 600 

ALAMDPQQR vLLETSW 
AGGCGTTCGAAAGCGCCGGCATCACCCCGGACTCGACCCGCGGCAGCGAC 650 

AK pns-FSAGITPDSTRGSD 

ACCGGCGTGTTCGTCGGCGCCTTCTCCTACGGTTACGGCACCGGTGCGGA 700 

cIccgacLcttcLcgcgaccLctcgcaga^ 750 
50 ggctg?cg?acttc?acggt^ 800 

h T"S Y F Y G LEG PAVTV DT 

Lgtgttcgtcgtcgctggtggcgctgcaccaggccgggcagtcgctgcg 850 
ac.ssslvalhqagqslr 
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CTCCGGCGAATGCTCGCTCGCCCTGGTCGGCGGCGTCACGGTGATGGCGT 900 

SGECSLALVGGVTVMA 
CTCCCGGCGGCTTCGTGGAGTTCTCCCGGCAGCGCGGCCTCGCGCCX3GAC 950 
S PGG FVEFS RQRGLAPD 
5 GGCCGGGCGAAGGCGTTCGGCGCGGGTGCGGACGGCACGAGCTTCGCCGA 1000 
GRAKAFGAGADGTSFAE 
GGGTGCCGGTGTGCTGATCGTCGAGAGGCTCTCCGACGCCGAACGCAACG 1050 

GA G V L I VE RLS DAE R N 
GTCACACCGTCCTGGCGGTCGTCCGTGGTTCGGCGGTCAACCAGGATGGT 1100 
10 GHTVLAVVRG SAVNO DG 

GCCTCC AACGGGCTGTCGGCGCCGAACGGGCCGTCGCAGGAGCGGGTGAT 1150 

ASNGLSAPNGPSQE RVI 
CCGGCAGGCCCTGGCCAACGCCGGGCTCACCCCGGCGGACGTGGACGCCG 1200 
RQALANAGLT PADVDA 
1 5 TCGAGGCCCACGGC ACCGGCACCAGGCTGGGCG ACCCCATCGAGGC ACAG 1250 
VEAHGTGTRLGDPIEAQ 
GCGGT ACTGGCCACCTACGGACAGGAGCGCGCCACCCCCCTGCTGCTGGG 1300 

AVLATYGQERATP L LLG 
CTCGCTGAAGTCCAACATCGGCCACGCCCAGGCCGCGTCCGGCGTCGCCG 1350 
20 SLKSN IGHAQAASGVA 

GCATCATCAAGATGGTGCAGGCCCTCCGGCACGGGGAGCTGCCGCCGACG 1400 
GIIKMVQALRHGELPPT 
CTGCACGCCGACGAGCCGTCGCCGCACGTCGACTGGACGGCCGGCGCCGT 1450 
LHADE pSPHVDWTAGAV 
25 CGAACTGCTGACGTCGGCCCGGCCGTGGCCCGAGACCGACCGGCCTAGGC 1500 
ELLT SARPWPETDRPR 
GGGCAGGCGTGTCGTCCTTCGGGATCAGTGGCACCAACGCCCACGTCATC 1550 
RAGVSS FGISGTNAHVI 
CTGGAAAGCGCACCCCCCACTCAGCCTGCGGACAACGCGGTGATCGAGCG 1600 
30 LESAPPTQPADNAVIER 

GGCACCGGAGTGGGTGCCGTTGGTGATTTCGGCCAGGACCCAGTCGGCTT 1650 

A P E W V P L V I S A R T Q S A 
TGACTGAGCACGAGGGCCGGTTGCGTGCGTATCTGGCGGCGTCGCCCGGG 1700 
LTEHEGRLRAYLAASPG 
35 GTGGATATGCGGGCTGTGGCATCGACGCTGGCGATGACACGGTCGGTGTT 1750 

V DM RA VAS T LAMT RS V F 
CGAGCACCGTGCCGTGCTGCTGGGAGATGACACCGTCACCGGCACCGCTG 1800 

EHRAVLLGDDTVTG TA 
TGTCTGACCCTCGGGCGGTGTTCGTCTTCCCGGGACAGGGGTCGCAGCGT 1850 
40 VSD PRAVFVFPGQGSQR 

GCTGGCATGGGTGAGGAACTGGCCGCCGCGTTCCCCGTCTTCGCGCGGAT 1 900 

AGMGE ELAAAFPVFARI 
CCATCAGCAGGTGTGGGACCTGCTCGATGTGCCCGATCTGGAGGTGAACG 1950 
HQQVW DLLDVPDLEV N 
45 AGACCGGTTACGCCCAGCCGGCCCTGTTCGCAATGCAGGTGGCTCTGTTC 2000 
E T G Y A Q PAL F A M Q V A L F 
GGGCTGCTGGAATGGTGGGGTGTACGACCGGACGCGGTGATCGGCCATTC 2050 

G L L E S W G V R P D A V I G H S 
GGTGGGTGAGCTTGCGGCTGCGTATGTGTCCGGGGTGTGGTCGTTGGAGG 2100 
50 VGELAAAYVSGVWS LE 

ATGCCTGCACTTTGGTGTCGGCGCGGGCTCGTCTGATGCAGGCTCTGCCC 2150 
DACTLV SARARLMQALP 
GCGGGTGGGGTGATGGTCGCTGTCCCGGTCTCGGAGGATGAGGCCCGGGC 2200 
AGGVMVAVPVSEDE ARA 
55 CGTGCTGGGTGAGGGTGTGGAGATCGCCGCGGTCAACGGCCCGTCGTCGG 2250 
VLG EGVEIAAVNG PS S 
TGGTTCTCTCCGGTGATGAGGCCGCCGTGCTGCAGGCCGCGGAGGGGCTG 2300 

V V L S G D E AAV L Q A A E G L 
GGGAAGTGGACGCGGCTGGCGACCAGCCACGCGTTCCATTCCGCCCGTAT 2350 

60 G KW T R LAT S HA FH S AR M 

GGAACCCATGCTGGAGGAGTTCCGGGCGGTCGCCGAAGGCCTGACCTACC 2400 

E PMLEE FRAVAEGLTY 
GGACGCCGCAGGTCTCCATGGCCGTTGGTGATCAGGTGACCACCGCTGAG 2 4 50 
R T P Q V S M A V G D Q V T T A E 
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TACTGGGTGCGGGAGGTCCGGGACACGGTCCGGTTCGGCGAGCAGGTGGC 2500 

YWVRQVRDTVRFG E Q VA 
CTCGTACGAGGACGCCGTGTTCGTCGAGCTGGGTGCCGACCGGTCACTGG 2550 

SYEDAVFVELGADRSL 
CCCGCCTGGTCGACGGTGTCGCGATGCTGCACGGCGACCACGAAATCCAG 2600 
ARLVDGVA MLHGDHEIQ 
GCCGCGATCGGCGCCCTGGCCCACCTGTATGTCAACGGCGTCACGGTCGA 2650 

AA I G ALAH L Y V NG V T V D 
CTGGCCCGCGCTCCTGGGCGATGCTCCGGCAACACGGGTGCTGGACCTTC 2700 

WPA LLGDAPATR VLDL 
CGACAT ACGCCTTCCAGCACCAGCGCTACTGGCTCGAGTCGGCACGCCCG 2750 
PTYAFQHQRYWLESAR P 
GCCGCATCCGACGCGGGCCACCCCGTGCTGGGCTCCGGTATCGCCCTCGC 2800 
AASDAGHPVLGSGIAL A 
15 CGGGTCGCCGGGCCGGGTGTTCACGGGTTCCGTGCCGACCGGTGCGGACC 2850 
GSPGRVFTGSVPTGAD 
GCGCGGTGTTCGTCGCCGAGCTGGCGCTGGCCGCCGCGGACGCGGTCGAC 2900 
RAVFVAELALAAADAVD 
TGCGCCACGGTCGAGCGGCTCGACATCGCCTCCGTGCCCGGCCGGCCGGG 2 950 
20 CATVERLDIASVPGRPG 

CCATGGCCGGACGACCGTACAGACCTGGGTCGACGAGCCGGCGGACGACG 3000 

HGRTTVQTWVDEPADD 
GCCGGCGCCGGTTCACCGTGCACACCCGCACCGGCGACGCCCCGTGGACG 3050 
GRRRFTVHTRTG DAPWT 
25 CTGCACGCCGAGGGGGTGCTGCGCCCCCATGGCACGGCCCTGCCCGATGC 3100 
LHAEGVLRPHGTALPDA 
GGCCGACGCCGAGTGGCCCCCACCGGGCGCGGTGCCCGCGGACGGGCTGC 3150 

ADAEWPPPGAVPADGL 
CGGGTGTGTGGCGCCGGGGGGACCAGGTCTTCGCCGAGGCCGAGGTGGAC 3200 
30 PGVW RRG DQVFAEA EVD 

GGACCGGACGGTTTCGTGGTGCACCCCGACCTGCTCGACGCGGTCTTCTC 3250 

G P DG FVV H P DLL DAV F S 
CGCGGTCGGCGACGGAAGCCGCCAGCCGGCCGGATGGCGCGACCTGACGG 3300 
AVGDGSRQ PAGWRDLT 
35 TGCACGCGTCGGACGCCACCGTACTGCGCGCCTGCCTCACCCGGCGCACC 3350 
VHAS DATVLRACLTRRT 
GACGGAGCCATGGGATTCGCCGCCTTCGACGGCGCCGGCCTGCCGGTACT 3400 

DGAMGFAAFDGAGLPVL 
CACCGCGGAGGCGGTGACGCTGCGGGAGGTGGCGTCACCGTCCGGCTCCG 3450 
40 TAEAVTLREVASPSG S 

AGGAGTCGGACGGCCTGCACCGGTTGGAGTGGCTCGCGGTCGCCGAGGCG 3500 
EES DGLHRLEWLAVAEA 
GTCTACGACGGTGACCTGCCCGAGGGACATGTCCTGATCACCGCCGCCCA 3550 
VYDGDLPEGHVLITAAH 
45 CCCCGACGACCCCGAGGACATACCCACCCGCGCCCACACCCGCGCCACCC 3600 
PDDPEDIPTRAHTRAT 
GCGTCCTGACCGCCCTGCAACACCACCTCACCACCACCGACCACACCCTC 3650 

R V L T A L Q H H L T T T D H T L 
ATCGTCCACACCACCACCGACCCCGCCGGCGCCACCGTCACCGGCCTCAC 3700 
50 IVHTTTDPAGATVTGLT 

CCGCACCGCCCAGAACGAACACCCCCACCGCATCCGCCTCATCGAAACCG 3750 

RTAQNEHPHRIRLIET 
ACCACCCCCACACCCCCCTCCCCCTGGCCCAACTCGCCACCCTCGACCAC 3800 
DHPHTPLPLAQLATLDH 
55 CCCCACCTCCGCCTCACCCACCACACCCTCCACCACCCCCACCTCACCCC 3850 
PHLRLTHHTLHHPHLTP- 
CCTCCACACCACCACCCCACCCACCACCACCCCCCTCAACCCCGAACACG 3900 

LHTTTPPTTTPLNPEH 
CCATCATCATCACCGGCGGCTCCGGCACCCTCGCCGGCATCCTCGCCCGC 3950 . 

60 at I I TGG SGTLAG I LAR _ 
CACCTGAACCACCCCCACACCTACCTCCTCTCCCGCACCCCACCCCCCGA 4000 

HLNHPHTYLLSRTPPPD 
CGCCACCCCCGGCACCCACCTCCCCTGCGACGTCGGCGACCCCCACCAAC 4050 
ATPGTHLPCDV GDPHQ 
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TCGCCACCACCCTCACCCACATCCCCCAACCCCTCACCGCCATCTTCCAC 4100 

LATTLT H I PQPLTAI FH 
ACCGCCGCCACCCTCGACGACGGCATCCTCCACGCCCTCACCCCCGACCG 4150 
TAATLDDGILHALTPDR 
5 CCTCACCACCGTCCTCCACCCCAAAGCCAACGCCGCCTGGCACCTGCACC 4200 
LT T V L H P KANAAWH L H 
ACCTCACCCAAAACCAACCCCTCACCCACTTCGTCCTCTACTCCAGCGCC 4 250 
HLTQNQP LT H FVLY S SA 
GCCGCCGTCCTCGGCAGCCCCGGACAAGGAAACTACGCCGCCGCCAACGC 4300 
10 AAVLGS p .GQGNYAA ANA 

CTTCCTCGACGCCCTCGCCACCCACCGCCACACCCTCGGCCAACCCGCCA 4350 

FLDALATHRHTLGQPA 
CCTCCATCGCCTGGGGCATGTGGCACACCACCAGCACCCTCACCGGACAA 4400 
TSIAWGM.WHTTS TLT G Q 
1 5 CTCGACGACGCCGACCGGG ACCGCATCCGCCGCGGCGGTTTCCTCCCGAT 4450 
LDDADRDR1 RRGGFLPI 
CACGGACGACGAGGGCATGGGGATGCAT 
T D D E G 

20 The Avrll-Xhol restriction fragment that encodes module 8 of the FK-520 PKS 

with the endogenous AT domain replaced by the AT domain of module 13 (specific for 
methylmalonyl CoA) of the rapamycin PKS has the DNA sequence and encodes the 
amino acid sequence shown below. 

AGATCTGGCAGCTCGCCGAAGCGCTGCTGACGCTCGTCCGGGAGAGCACC 50 
25 QLAEALLTLVREST 

GCCGCCGTGCTCGGCCACGTGGGTGGCGAGGACATCCCCGCGACGGCGGC 100 

AAVLGHV GGEDIPATAA 
GTTCAAGGACCTCGGCATCGACTCGCTCACCGCGGTCCAGCTGCGCAACG 150 
F K DLG 1 D S LTAVQL RN 
30 CCCTCACCGAGGCGACCGGTGTGCGGCTGAACGCCACGGCGGTCTTCGAC 200 
AL TEATGV RLNATAVFD 
TTCCCGACCCCGCACGTGCTCGCCGGGAAGCTCGGCGACGAACTGACCGG 250 

FPT pHVLAGKLGDELTG 
CACCCGCGCGCCCGTCGTGCCCCGGACCGCGGCCACGGCCGGTGCGCACG 300 
35 T RA PVV P RTAATAG AH 

ACGAGCCGCTGGCGATCGTGGGAATGGCCTGCCGGCTGCCCGGCGGGGTC 350 
DEPLAIVGMACRLPGGV 
GCGTCACCCGAGG AGCTGTGGCACCTCGTGGCATCCGGCACCGACGCCAT 400 
ASPEELWHLVASGTDA1 
40 CACGGAGTTCCCGACGGACCGCGGCTGGGACGTCGACGCGATCTACGACC 450 
TEFPTDRGWDVDAIYD 
CGGACCCCGACGCGATCGGCAAGACCTTCGTCCGGCACGGTGGGTTCCTC 500 
PDPDAIGKTFV R H G G F L 
ACCGGCGCGACAGGGTTCGACGCGGCGTTCTTCGGCATCAGCCCGCGCGA 550 
45 T GATG FD AAFFGI SPRE 

GGCCCTCGCGATGGACCCGCAGCAGCGGGTGCTCCTGGAGACGTCGTGGG 600 

ALAMDPQQRVLLETS W 
AGGCGTTCGAAAGCGCCGGCATCACCCCGGACTCGACCCGCGGCAGCGAC 650 
EAFESAGITPDSTRGSD 
50 ACCGGCGTGTTCGTCGGCGCCTTCTCCTACGGTTACGGCACCGGTGCGGA 700 
T GVFVGAFSYGYGT GA D 
CACCGACGGCTTCGGCGCGACCGGCTCGCAGACCAGTGTGCTCTCCGGCC 750 

TDGFG A TGSQTSV LSG 
GGCTGTCGTACTTCTACGGTCTGGAGGGTCCGGCGGTCACGGTCGACACG 800 
55 RLSYFYGLEGPAVT VDT 

GCGTGTTCGTCGTCGCTGGTGGCGCTGCACCAGGCCGGGCAGTCGCTGCG 850 

ACSSSLVALHQAGQSLR 
CTCCGGCGAATGCTCGCTCGCCCTGGTCGGCGGCGTCACGGTGATOGCGT 900 
SGECSLALVGGVTVMA 
60 CTCCCGGCGGCTTCGTGGAGTTCTCCCGGCAGCGCGGCCTCGCGCGGGAC 950 
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SPGGFVEFSRQRGLAPD 
GGCCGGGCGAAGGCGTTCGGCGCGGGTGCGGACGGCACGAGCTTGGGGGA 1-000 

GRAKAFGAGADGTSFAE 
GGGTGCCGGTGTGCTGATCGTCG AGAGGCTCTCCGACGCCGAACGCAACG 1050 
5 G AGVL IV E R LS DAE RN 

GTCACACCGTCCTGGCGGTCGTCCGTGGTTCGGCGGTCAACCAGGATGGT 1100 
GHTVLAVVRGSAVNQDG 
GCCTCCAACGGGCTGTCGGCGCCGAACGGGCCGTCGCAGGAGCGGGTGAT 1150 
A S N G L SA PNG P S Q E RV I 
1 0 CCGGC AGGCCCTGGCCAACGCCGGGCTCACCCCGGCGGACGTGGACGCCG 1200 
RQALANAGL TPADVDA 
TCGAGGCCCACGGCACCGGCACCAGGCTGGGCGACCCCATCGAGGCACAG 1250 
V E A H G.T G T RLG D P I EAQ 
GCGGT ACTGGCCACCTACGGACAGGAGCGCGCCACCCCCCTGCTGCTGGG 1300 
15 AVLATY G QERATPLLLG 

CTCGCTGAAGTCCAACATCGGCCACGCCCAGGCCGCGTCCGGCGTCGCCG 1350 

S LKSNI G HAQAAS GVA 
GCATCATCAAGATGGTGCAGGCCCTCCGGCACGGGGAGCTGCCGCCGACG 14 00 
GI I KMVQALRHGELP PT 
20 CTGCACGCCGACGAGCCGTCGCCGCACGTCGACTGGACGGCCGGCGCCGT 14 50 
LHADEPSPHVDWTAGAV 
CGAACTGCTGACGTCGGCCCGGCCGTGGCCCGAGACCGACCGGCCTAGGC 1500 

ELLTSARPWPETDRPR 
GGGCGGGCGTGTCGTCCTTCGGAGTCAGCGGCACCAACGCCCACGTC ATC 1550 
25 RAGVS S FGVSGTNAHV I 

CTGGAGAGCGCACCCCCCGCTCAGCCCGCGGAGGAGGCGCAGCCTGTTGA 1600 

L E S AP P AQPAEEAQ PVE 
GACGCCGGTGGTGGCCTCGGATGTGCTGCCGCTGGTGATATCGGCCAAGA 1650 
TPVVASDVLPLVISAK 
30 CCCAGCCCGCCCTGACCGAACACGAAGACCGGCTGCGCGCCTACCTGGCG 1700 
T Q P A L T E H E D RL RAY L A 
GCGTCGCCCGGGGCGGATATACGGGCTGTGGCATCGACGCTGGCGGTGAC 1750 

A S P GA D I RAVA S T L A V T 
ACGGTCGGTGTTCGAGCACCGCGCCGTACTCCTTGGAGATGACACCGTCA 1800 
35 RSVFEHRAVLLGDDTV 

CCGGCACCGCGGTGACCGACCCCAGGATCGTGTTTGTCTTTCCCGGGCAG 1850 
T GT A VT DPRIVFV F PG Q 
GGGTGGCAGTG GCTGGGGATGGGCAGTGCACTGCGCGATTCGTCGGTGGT 1900 
GWQWLGMGSALRDSSVV 
40 GTTCGCCGAGCGGATGGCCGAGTGTGCGGCGGCGTTGCGCGAGTTCGTGG 1950 
FAERMAECAAALREF V 
ACTGGGATCTGTTCACGGTTCTGGATGATCCGGCGGTGGTGGACCGGGTT 2000 
DWDLFTVLDDPAVVDRV 
GATGTGGTCC AGCCCGCTTCCTGGGCGATGATGGTTTCCCTGGCCGCGGT 2050 
45 D V V Q PA S WAM MV S L AA V . 

GTGGCAGGCGGCCGGTGTGCGGCCGGATGCGGTGATCGGCCATTCGCAGG 2100 

W Q A A G V R P D AV I G H S Q 
GTGAGATCGCCGCAGCTTGTGTGGCGGGTGCGGTGTCACTACGCGATGCC 2150 
G E I AAACVAG.AV S L R DA 
50 GCCCGGATCGTGACCTTGCGCAGCCAGGCGATCGCCCGGGGCCTGGCGGG 2200 
ARIVTLRSQAIARGLAG 
CCGGGGCGCGATGGCATCCGTCGCCCTGCCCGCGCAGGATGTCGAGCTGG 2250 

RGAMASVALPAQDVEL 
TCGACGGGGCCTGGATCGCCGCCCACAACGGGCCCGCCTCCACCGTGATC 2300 
55 VDGAWIAAHNGPASTVI 

GCGGGCACCCCGG AAGCGGTCGACCATGTCCTCACCGCTCATGAGGCACA 2350 

AGT P EA V DHVLTA H E AQ 
AGGGGTGCGGGTGCGGCGGATCACCGTCGACTATGCCTCGCACACCCCGC 24 00 
GVRVRRITVDYASHTP 
60 ACGTCGAGCTGATCCGCGACGAACTACTCGACATCACTAGCGACAGCAGC 24 50 
HVELIRDELLDITSDSS 
TCGCAGACCCCGCTCGTGCCGTGGCTGTCGACCGTGGACGGCACCTGGGT 2500 

SQTPLVPWLSTVDGTWV 
CGACAGCCCGCTGGACGGGGAGTACTGGTACCGGAACCTGCGTGAACCGG 2550 
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DSPLDGEYWYRNLREP 
TCGGTTTCCACCCCGCCGTCAGCCAGTTGCAGGCCCAGGGCGACACCGTG 2600 
VGFHPAVSQLQAQGDTV 
TTCGTCGAGGTCAGCGCCAGCCCGGTGTTGTTGCAGGCGATGGACGACGA 2650 
FVEVSAS PVLLQAMDDD 
. TGTCGTCACGGTTGCCACGCTGCGTCGTGACGACGGCGACGCCACCCGGA 2700 

VVTVATLRRDDGDATR 
TGCTCACCGCCCTGGCACAGGCCTATGTCCACGGCGTCACCGTCGACTGG 27 50 
MLTAL AQAYVHGVTVD W 
CCCGCCATCCTCGGCACCACCACAACCCGGGTACTGGACCTTCCGACCTA 2800 

PAI LGTTTTRVLDLPT Y 
CGCCTTCCAACACCAGCGGTACTGGCTCGAGTCGGCACGCCCGGCCGCAT 2850 

AF QHQRYWLESARPAA 
CCGACGCGGGCCACCCCGTGCTGGGCTCCGGTATCGCCCTCGCCGGGTCG 2900 
SDAGHPV LGSGIA LAGS 
CCGGGCCGGGTGTTCACGGGTTGCGTGCCGACCGGTGCGGACCGCGCGGT 2950 

PGRVFTGSVPTGADRAV 
GTTCGTCGCCGAGCTGGCGCTGGCCGCCGCGGACGCGGTCGACTGCGCCA 3000 

FVAELALAAA DAVDCA 
CGGTCGAGCGGCTCGACATCGCCTCCGTGCCCGGCCGGCCGGGCCATGGC 3050 
TVERLDIASVPGRPGHG 
CGGACGACCGTACAGACCTGGGTCGACGAGCCGGCGGACGACGGCCGGCG 3100 

RTTVQTWVDEPADD G RR 
CCGGTTCACCGTGCACACCCGCACCGGCGACGCCCCGTGGACGCTGCACG 3150 

RFTVHTRTGDAPWTLH 
CCGAGGGGGTGCTGCGCCCCCATGGCACGGCCCTGCCCGATGCGGCCGAC 3200 
AEGVLRPHGTALPDAAD 
GCCGAGTGGCCCCCACCGGGCGCGGTGCCCGCGGACGGGCTGCCGGGTGT 3250 

AEWPPPGAVPADGLPGV 
GTGGCGCCGGGGGGACCAGGTCTTCGCCGAGGCCGAGGTGGACGGACCGG 3300 

WRRG DQV FAE AEVDG P 
ACGGTTTCGTGGTGCACCCCGACCTGCTCGACGCGGTCTTCTCCGCGGTC 3350 
DGFVVH PDLLD AVFSAV 
GGCGACGGAAGCCGCCAGCCGGCCGGATGGCGCGACCTGACGGTGCACGC 3400 

GDGSRQP AGWRDLTVHA 
GTCGGACGCCACCGTACTGCGCGCCTGCCTCACCCGGCGCACCGACGGAG 34 50 

SDATVLRACLTRRTDG 
CCATGGGATTCGCCGCCTTCGACGGCGCCGGCCTGCCGGTACTCACCGCG 3500 
AMG FAA FDGAGLPVLTA 
GAGGCGGTGACGCTGCGGGAGGTGGCGTCACCGTCCGGCTCCGAGGAGTC 3550 

EAVTLREVA S PSGSEE S 
GGACGGCCTGCACCGGTTGGAGTGGCTCGCGGTCGCCGAGGCGGTCTACG 3600 

D G L H R L E W L AVA E AV Y 
ACGGTGACCTGCCCGAGGGACATGTCCTGATCACCGGCGCCCACCCCGAC 3650 
DGDLPEGHVLITA A HP D 
GACCCCGAGGACATACCCACCCGCGCCCACACCCGCGCCACCCGCGTCCT 3700 

D P E D I P T R A H T R A T R V L 
GACCGCCCTGCAACACCACCTCACCACCACCGACCACACCCTGATCGTCC 3750 , 

T A L Q H H L T T T D H T L I V 
ACACCACCACCGACCCCGCCGGCGCCACCGTCACCGGCCTCACCCGCACC 3800 
HTTTDPAG ATV TG L TRT 
GCCCAGAACGAACACCCCCACCGCATCCGCCTCATCGAAACCGACCACCC 3850 

AQNEHPH RIRLIETD HP 
CCACACCCCCCTCCCCCTGGCCCAACTCGCCACCCTCGACCACCCCCACC 3900 

H T P L P L AQLATL D H P H 
TCCGCCTCACCCACCACACCCTCCACCACCCCCACCTCACCCCCCTCCAC 3950 
L RL T H H T L H H P H LT P LH 
ACCACCACCCCACCCACCACCACCCCCCTCAACCCCGAACACGCCATCAT .4000 

T T T P P T T * T PLN P E HA I.I, 
CATCACCGGCGGCTCCGGCACCCTCGCCGGCATCCTCGCCCGCCACCTGA 4050 

I TGGSGTLAG ILARH L 
ACCACCCCCACACCTACCTCCTCTCCCGC ACCCCACCCCCCGACGCCACC 4100 
NHPHTYLLSRTPPPDAT 
CCCGGCACCCACCTCCCCTGCGACGTCGGCGACCCCCACCAACTCGCCAC 4150 
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PGTHLPCDVGDPHQLAT 
CACCCTCACCCACATCeCCeAAeCCGTGACCGCCATCTTCCACACCGCCG 4200 

TLTHIPQPLTAIFHTA 
CCACCCTCGACGACGGCATCCTCCACGCCCTCACCCCCGACCGCCTCACC 4250 
ATLDDG ILHALTPDRLT 
ACCGTCCTCCACCCCAAAGCCAACGCCGCCTGGCACCTGCACCACCTCAC 4300 

TVLH PKANAAW HLHHLT 
CCAAAACCAACCCCTCACCCACTTCGTCCTCTACTCCAGCGCCGCCGCCG 4 350 

QNQPLTHF V L Y SSAA A 
TCCTCGGCAGCCCCGGACAAGGAAACTACGCCGCCGCCAACGCCTTCCTC 4400 
VLGSPGQGNYAAANAF L 
GACGCCCTCGCCACCCACCGCCACACCCTCGGCCAACCCGCCACCTCCAT 4 450 

DALATHRHTLGQPATSI 
CGCCTGGGGCATGTGGCACACCACCAGCACCCTCACCGGACAACTCGACG 4 500 

AWGMWH : TTSTLTGQLD 
ACGCCGACCGGGACCGCATCCGCCGCGGCGGTTTCCTCCCGATCACGGAC 4 550 
D A D R D R I RRGGFLPITD 
GACGAGGGCATGGGGATGCAT 

D E G 

The Nhell-Xhol restriction fragment that encodes module 8 of the FK-520 PKS 
with the endogenous AT domain replaced by the AT domain of module 12 (specific for 
malonyl Co A) of the rapamycin PKS has the DNA sequence and encodes the amino acid 
sequence shown below. 

AGATCTGGCAGCTCGCCGAAGCGCTGCTGACGCTCGTCCGGGAGAGCACC 5 0 

QLAEALLTLVREST 
GCCGCCGTGCTCGGCCACGTGGGTGGCGAGGACATCCCCGCGACGGCGGC 100 

AA VLGHVGG EDIPATAA 
GTTCAAGGACCTCGGCATCGACTCGCTCACCGCGGTCCAGCTGCGCAACG 150 

F KDLGIDSL TAVQLRN 
CCCTCACCGAGGCGACCGGTGTGCGGCTGAACGCCACGGCGGTCTTCGAC 200 
ALT EATG. VRLNATAVFD 
TTCCCGACCCCGCACGTGCTCGCCGGGAAGCTCGGCGACGAACTGACCGG 250 

FPT PHVLAGKLGD' ELTG 
CACCCGCGCGCCCGTCGTGCCCCGGACCGCGGCCACGGCCGGTGCGCACG 300 

TRAP V VPRTAATAGAH 
ACGAGCCGCTGGCGATCGTGGGAATGGCCTGCCGGCTGCCCGGCGGGGTC 350 
DE PLAI VGMACRLPGGV 
GCGTCACCCGAGGAGCTGTGGCACCTCGTGGCATCCGGCACCGACGCCAT 400 

AS PEELWH LVASGTDAI 
CACGGAGTTCCCGACGGACCGCGGCTGGGACGTCGACGCGATCTACGACC 450 

TEFPTDRGWDVDAIYD 
CGGACCCCGACGCGATCGGCAAGACCTTCGTCCGGCACGGTGGCTTCCTC 500 
PDPDAIG KTF V R H G G F L 
ACCGGCGCGACAGGCTTCGACGCGGCGTTCTTCGGCATCAGCCCGCGCGA 550 

TGA T GF DAAF FGISPRE 
GGCCCTCGCGATGGACCCGCAGCAGCGGGTGCTCCTGGAGACGTCGTGGG 600 

ALAMDPQQRVLLETSW 
AGGCGTTCGAAAGCGCCGGCATCACCCCGGACTCGACCCGCGGCAGCGAC 650 
EAF ESA G I T PDS T RGS D 
ACCGGCGTGTTCGTCGGCGCCTTCTCCTACGGTTACGGCACCGGTGCGGA 700 

TGVFVGAFSYGYGTGAD 
CACCGACGGCTTCGGCGCGACCGGCTCGCAGACCAGTGTGCTCTCCGGCC 750 

TDGFGATGSQTSVLSG 
GGCTGTCGTACTTCTACGGTCTGGAGGGTCCGGCGGTCACGGTCGACACG 800 
RLSYFYGLEGPAVTVDT 
GCGTGTTCGTCGTCGCTGGTGGCGCTGCACCAGGCCGGGCAGTCGCTGCG 850 

ACSSSLVALHQAGQSLR 
CTCCGGCGAATGCTCGCTCGCCCTGGTCGGCGGCGTCACGGTGATGGCGT 900 
SGECSLALVGGVTVMA 
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CTCCCGGCGGCTTCGTGGAGTTCTCCCGGCAGCGCGGCCTCGCGCCGGAC 950 
SPGGFVEFSRQRGLAP D 
GGCCGGGCGAAGGCGTTCGGCGCGGGTGCGGACGGCACGAGCTTCGCCGA 1000 
GRAK AFGAGADGTS FAE 
5 GGGTGCCGGTGTGCTGATCGTCGAGAGGCTCTCCGACGCCGAACGCAACG 1050 
GAGVLIVERLSDAE RN 
GTCACACCGTCCTGGCGGTCGTCCGTGGTTCGGCGGTCAACCAGGATGGT 1100 
G H T V L A V V R G S A V N Q D G 
GCCTCCAACGGGCTGTCGGCGCCGAACGGGCCGTCGCAGGAGCGGGTGAT 1150 
10 AS NGLSA PNGPS QER VI 

CCGGCAGGCCCTGGCCAACGCCGGGCTCACCCCGGCGGACGTGGACGCCG 1200 

R QALANAGL T PAD VDA 
TCGAGGCCCACGGCACCGGCACCAGGCTGGGCGACCCCATCGAGGCACAG 1250 
VEAHGTGTRLGDPI EA Q 
1 5 GCGGTACTGGCCACCTACGGACAGGAGCGCGCCACCCCCCTGCTGCTGGG 1300 
AVLAT YGQER ATPLLLG 
CTCGCTGAAGTCCAACATCGGCCACGCCCAGGCCGCGTCCGGCGTCGCCG 1350 

S LKS N I G HAQAA S G VA 
GCATCATCAAGATGGTGCAGGCCCTCCGGCACGGGGAGCTGCCGCCGACG 1400 
20 GI I KMVQAL RHGELP PT 

CTGCACGCCGACGAGCCGTCGCCGCACGTCGACTGGACGGCCGGCGCCGT 1450 

L H A D E P S P H V D W T A G A V 
CGAACTGCTGACGTCGGCCCGGCCGTGGCCCGAGACCGACCGGCCACGGC 1500 
ELLTSARPWPETDRPR 
25 GTGCCGCCGTCTCCTCGTTCGGGGTGAGCGGCACCAACGCCCACGTCATC 1550 
RAAVS S FGVSGTNAHVI 
CTGGAGGCCGGACCGGTAACGGAGACGCCCGCGGCATCGCCTTCCGGTGA 1600 

LEAG PVTETPAASPSGD 
CCTTCCCCTGCTGGTGTCGGCACGCTCACCGGAAGCGCTCGACGAGCAGA 1650 
30 LPLLVSARSPEALDE Q 

TCCGCCGACTGCGCGCCTACCTGGACACCACCCGGGACGTCGACCGGGTG 1700 
I RRLRAYLDT TPDVD R.V 
GCCGTGGCACAGACGCTGGCCCGGCGCACACACTTCGCCCACCGCGCCGT 1750 
AVAQTLARRTH FAH RAV 
35 GCTGCTCGGTGACACCGTCATCACCACACCCCCCGCGGACCGGCCCGACG 1800 
LLG DTV I TT P P AD RPD 
AACTCGTCTTCGTCTACTCCGGCCAGGGCACCCAGCATCCCGCGATGGGC 1850 
ELVFVY SGQGTQHP A MG 
GAGCAGCTAGCCGCCGCGTTCCCCGTCTTCGCGCGGATCCATCAGCAGGT 1 900 
40 EQLAAAFPVFARIHQQV 

• GTGGGACCTGCTCGATGTGCCCGATCTGGAGGTGAACGAGACCGGTTACG 1950 
W D L L D V P D L E V N E T G Y 
CCCAGCCGGCCCTGTTCGCAATGCAGGTGGCTCTGTTCGGGCTGCTGGAA 2000 
AQ PAL FA MQVAL FGLL E 
45 TCGTGGGGTGTACGACCGGACGCGGTGATCGGCCATTCGGTGGGTGAGCT 2050 
SWGV RPDAVIGHSVGEL 
TGCGGCTGCGTATGTGTCCGGGGTGTGGTCGTTGGAGGATGCCTGCACTT 2100 

A A A Y V S G V W S L E D A C T 
TGGTGTCGGCGCGGGCTCGf CTGATGCAGGCTCTGCCCGCGGGTGGGGTG 2150 
50 LVSARAR LMQALPAGG V 

ATGGTCGCTGTCCCGGTCTCGGAGGATGAGGCCCGGGCCGTGCTGGGTGA 2200 

MVAVP VSEDEARAVLGE 
GGGTGTGGAGATCGCCGCGGTCAACGGCCCGTCGTCGGTGGTTCTCTCCG 2250 
GVEIAAVNGPSSVVLS 
55 GTGATGAGGCCGCCGTGCTGCAGGCCGCGGAGGGGCTGGGGAAGTGGACG 2300 
G D E A A V L Q A A E G L G K W T 
CGGCTGGCGACCAGCCACGCGTTCCATTCCGCCCGTATGGAACCCATGCT 2350 

RLATS HAFHSARME PML 
GGAGGAGTTCCGGGCGGTCGCCGAAGGCCTGACCTACCGGACGCCGCAGG 24 00 
60 E E F RA VA E G L T Y R T P- Q 

TCTCCATGGCCGTTGGTGATCAGGTGACCACCGCTGAGTACTGGGTGCGG 24 50 
VS M.AVG DQVT TAEY WVR 
CAGGTCCGGGACACGGTCCGGTTCGGCGAGCAGGTGGCCTCGTACGAGGA 2500 
QVRDTVRFGEQVAS Y ED 
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CGCCGTGTTCGTCGAGCTGGGTGCCGACCGGTCACTGGCCCGCCTGGTCG 2550 

A V F V E L G A D R S L A R L V 
ACGGTGTCGCGATGCTGCACGGCGACCACGAAATCCAGGCCGCGATCGGC 2600 
DGVAMLHGDHEIQA.AI6 
5 GCCCTGGCCCACCTGTATGTCAACGGCGTCACGGTCGACTGGCCCGCGCT 2650 
ALAHLYVNGVTVDWPAL 
CCTGGGCGATGCTCCGGCAACACGGGTGCTGGACCTTCCGACATACGCCT 2700 

L G DA PA TRV LDLPTYA 
TCCAGCACCAGCGCTACTGGCTCGAGTCGGCACGCCCGGCCGCATCCGAC 2750 
10 FQHQRYW L ESARPAASD 

GCGGGCCACCCCGTGCTGGGCTCCGGTATCGCCCTCGCCGGGTCGCCGGG 2800 

AGH P-VLG. SGIALAGS PG 
CCGGGTGTTCACGGGTTCCGTGCCGACCGGTGCGGACCGCGCGGTGTTCG 2850 
RVFT GSVPTGADRAVF 
15 TCGCCGAGCTGGCGCTGGCCGCCGCGGACGCGGTCGACTGCGCCACGGTC 2900 
VAELALAAADAVDCATV 
GAGCGGCTCGAC ATCGCCTCCGTGCCCGGCCGGCCGGGCCATGGCCGGAC 2950 

ERLDIA SVPGRPGHGRT 
GACCGTACAGACCTGGGTCGACGAGCCGGCGGACGACGGCCGGCGCCGGT 3000 
20 TVQTWVDEP ADDGRRR 

TCACCGTGCACACCCGCACCGGCGACGCCCCGTGGACGCTGCACGCCGAG 3050 
FTVHTRTGDAPWTLHAE 
GGGGTGCTGCGCCCCC ATGGCACGGCCCTGCCCGATGCGGCCGACGCCGA 3100 
GV LRPHGTALPDAADAE 
25 GTGGCCCCCACCGGGCGCGGTGCCCGCGGACGGGCTGCCGGGTGTGTGGC 3150 
WPPPGAVPADGLPGVW 
GCCGGGGGGACCAGGTCTTCGCCGAGGCCGAGGTGGACGGACCGGACGGT 3200 
RRG DQV FAEA EVDGPDG 
TTCGTGGTGCACCCCGACCTGCTCGACGCGGTCTTCTCCGCGGTCGGCGA 3250 
30 FVVH PDLLDAVFSAVG D 

CGGAAGCCGCCAGCCGGCCGGATGGCGCGACCTGACGGTGCACGCGTCGG 3300 

GSR QPAGW RDL TVHAS 
ACGCCACCGTACTGCGCGCCTGGCTCACCCGGCGCACCGACGGAGCCATG 3350 
DATVLRACLTRRTDGAM 
35 GGATTCGCCGCCTTCGACGGCGCCGGCCTGCCGGTACTCACCGCGGAGGC 3400 
GFAAFDGAGLPVIiTAEA 
GGTGACGCTGCGGGAGGTGGCGTCACCGTCCGGCTCCGAGGAGTCGGACG 3450 

VTLREVASPSGSEESD 
GCCTGCACCGGTTGGAGTGGCTCGCGGTCGCCGAGGCGGTCTACGACGGT 3500 
40 GLHRLEWLAVAEAVYDG 

GACCTGCCCGAGGGAC ATGTCCTGATCACCGCCGCCCACCCCGACGACCC 3550 

DLPEGHVLITAAHP DDP 
CGAGGACATACCCACCCGCGCCCACACCCGCGCCACCCGCGTCCTGACCG 3600 
EDI PTRAHTRATRVLT 
45 CCCTGCAACACCACCTCACCACCACCGACCACACCCTCATCGTCCACACC 3650 
ALQHHLTT. TDHTLIVHT 
ACCACCGACCCCGCCGGCGCCACCGTCACCGGCCTCACCCGCACCGCCCA 3700 

T T D P A G A T V T G L T R T A Q 
GAACGAACACCCCCACCGCATCCGCCTCATGGAAACCGACCACCCCCACA 3750 
50 NEHPHRIRLIETDHPH 

CCCCCCTCCCCCTGGCCCAACTCGCCACCCTCGACCACCCCCACCTCCGC 3800 
TPLPLAQLATLDHPHLR 
CTCACCCACCACACCCTCCACCACCCCCACCTCACCCCCCTCCACACCAC 3850 
LTHHTLHHPHLTPLHTT 
55 CACCCCACCCACCACCACCCCCCTCAACCCCGAACACGCCATCATCATCA 3900 
TPPTTTPLNPEHAI I I 
CCGGCGGCTCCGGCACCCTCGCCGGCATCCTCGCCCGCCACCTG AACCAC 3950 
TGGSGTLAGILARHLNH 
CCCC ACACCTACCTCCTCTCCCGCACCCCACCCCCCGACGCCACCCCCGG 4000 
60 PHT YLLS RTPPPDATPG 

CACCCACCTCCCCTGCGACGTCGGCGACCCCCACCAACTCGCCACCACCC 4050 

THLPCDV GDPHQLATT 
TCACCCACATCCCCCAACCCCTCACCGCCATCTTCCACACCGCCGCCACC 4100 
LTH I PQPLTAIFHTAAT 
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CTCGACGACGGCATCCTCCACGCCCTCACCCCCGACCGCCTCACCACCGT 4150 

LDDGILHALTPDRLTTV 
CCTCCACCCCAAAGCCAACGCCGCCTGGCACCTGCACCACCTCACCCAAA 4 200 

LHPK ANAAWHLHHLTQ 
ACCAACCCCTCACCCACTTCGTCCTCTACTCCAGCGCCGCCGCCGTCCTC 4 250 
NQPLTH FVLYSSAA A VL 
GGCAGCCCCGGACAAGGAAACTACGCCGCCGCCAACGCCTTCCTCGACGC 4 300 

GSPGQGNYAAANAFLDA 
CCTCGCCACCCACCGCCACACCCTCGGCCAACCCGCCACCTCCATCGCCT 4350 

LATHRHTLG QPATSI A 
GGGGCATGTGGCACACCACCAGCACCCTCACCGGACAACTCGACGACGCC 4 4 00 
WGMWHTTST L TGQLDDA 
GACCGGGACCGCATCCGCCGCGGCGGTTTCCTCCCGATCACGGACGACGA 4 4 50 

DR DR I R R G G F L P I T DDE 
GGGCATGGGGATGCAT 
G 

The Nhell-Xhol restriction fragment that encodes module 8 of the FK-520 PKS 
with the endogenous AT domain replaced by the AT domain of module 13 (specific for 
methylmalonyl Co A) of the rapamycin PKS has the DNA sequence and encodes the 
amino acid sequence shown below. 

AGATCTGGCAGCTCGCCGAAGCGCTGCTGACGCTCGTCCGGGAGAGCACC 50 

QLAEALLTLVREST 
GCCGCCGTGCTCGGCCACGTGGGTGGCGAGGACATCCCCGCGACGGCGGC 100 

AAVLGHVGGEDI PATAA 
GTTCAAGGACCTCGGCATCGACTCGCTCACCGCGGTCCAGCTGCGCAACG 150 

FKDLGI DSLTAVQLRN 
CCCTCACCGAGGCGACCGGTGTGCGGCTGAACGCCACGGCGGTCTTCGAC 200 
ALTEATGV RLNATAVFD 
TTCCCGACCCCGCACGTGCTCGCCGGGAAGCTCGGCGACGAACTGACCGG 250 

FPTPHVLAGKLGDELTG 
CACCCGCGCGCCCGTCGTGCCCCGGACCGCGGCCACGGCCGGTGCGCACG 300 

TRAPVVPRTAATAGAH 
ACGAGCCGCTGGCGATCGTGGGAATGGCCTGCCGGCTGCCCGGCGGGGTC 350 
DEPLAIVGMACRLPGGV 
GCGTCACCCGAGGAGCTGTGGCACCTCGTGGCATCCGGCACCGACGCCAT 4 00 

AS PEELWHLVASGTDAI 
CACGGAGTTCCCGACGGACCGCGGCTGGGACGTCGACGCGATCTACGACC 450 

TEFPTDR GWDVDAI YD 
CGGACCCCGACGCGATCGGCAAGACCTTCGTCCGGCACGGTGGCTTCCTC 500 
P D P D A I G K T F V R , H Q G F h 
ACCGGCGCGACAGGCTTCGACGCGGCGTTCTTCGGCATCAGCCCGCGCGA 550 

TGATGFDAAF F GIS PRE 
GGCCCTCGCGATGGACCCGCAGCAGCGGGTGCTCCTGGAGACGTCGTGGG 600 

A L AMD P Q Q R V L L E T S W 
AGGCGTTCGAAAGCGCCGGCATCACCCCGGACTCGACCCGCGGCAGCGAC 650 
E A F E S A G I T P D S T R G S D 
ACCGGCGTGTTCGTCGGCGCCTTCTCCTACGGTTACGGCACCGGTGCGGA .700 

TGVFVGAF S YG YGTGAD 
CACCGACGGCTTCGGCGCGACCGGCTCGCAGACCAGTGTGCTCTCCGGCC 750 

T D G F G A T G S Q T S V L S G 
GGCTGTCGTACTTCTACGGTCTGGAGGGTCCGGCGGTCACGGTCGACACG 800 
RLSYFYGLEGPA VTVD T 
GCGTGTTCGTCGTCGCTGGTGGCGCTGCACCAGGCCGGGCAGTCGCTGCG . 850 

ACSS'SXjVALHQAGQSLR 
CTCCGGCGAATGCTCGCTCGCCCTGGTCGGCGGCGTCACGGTGATGGCGT 900 

SGEC S L ALVG GVTVMA 
CTCCCGGCGGCTTCGTGGAGTTCTCCCGGCAGCGCGGCCTCGCGCCGGAC 950 
SPGGFVEFSRQRGLAPD 
GGCCGGGCGAAGGCGTTCGGCGCGGGTGCGGACGGCACGAGCTTCGCCGA 1000 
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GRAKAFGAGADGTSFAE 
GGGTGCCGGTGTGCTGATCGTCGAGAGGCTCTCCGACGCCGAACGCAAGG 1050 

GAGVLIVERL SDAERN 
GTCACACCGTCCTGGCGGTCGTCCGTGGTTCGGCGGTCAACCAGGATGGT 1100 
5 GHTVLAVVRGSAVNQDG 

. GCCTCCAACGGGCTGTCGGCGCCGAACGGGCCGTCGCAGGAGCGGGTGAT 1150 
ASNGLSAP N GPSQERVI 
CCGGC AGGCCCTGGCCAACGCCGGGCTCACCCCGGCGGACGTGGACGCCG 1200 
R QALAN AGLTPADV DA 
1 0 TCGAGGCCCACGGCACCGGCACCAGGCTGGGCGACCCCATCGAGGCACAG 1250 
V E A H G T G T R L G D P I E A Q 
GCGGTACTGGCCACCTACGGACAGGAGCGCGCCACCCCCCTGCTGCTGGG 1300 

AVL ATYGQERATPLLLG 
CTCGCTGAAGTCCAACATCGGCCACGCCCAGGCCGCGTCCGGCGTCGCCG 1350 
15 S LKS N I GHAQAASGVA 

GCATCATCAAGATGGTGCAGGCCCTCCGGC ACGGGGAGCTGCCGCCGACG 1400 
G I I KMV QA LRHGEL PPT 
CTGCACGCCGACGAGCCGTCGCCGC ACGTCG ACTGGACGGCCGGCGCCGT 1450 
LHADE PS PHVDWTAGAV 
20 CGAACTGCTGACGTCGGCCCGGCCGTGGCCCGAGACCGACCGGCCACGGC 1500 
ELLTSARPWPETDRPR 
GTGCCGCCGTCTCCTCGTTCGGGGTGAGCGGCACCAACGCCCACGTCATC 1550 
RAAVS S FGVSGTNAHV I 
CTGGAGGCCGG ACCGGT AACGG AGACGCCCGCGGC ATCGCCTTCCGGTGA 1600 
25 LEAGPVTETPAASPSGD 

CCTTCCCCTGCTGGTGTCGGCACGCTCACCGGAAGCGCTCGACGAGCAGA 1650 

LPLLVSARS PEALDEQ 
TCCGCCGACTGCGCGCCTACCTGGACACCACCCCGGACGTCGACCGGGTG 1700 
IRRLRAYLDTTPDVDRV 
30 GCCGTGGCACAGACGCTGGCCCGGCGCACACACTTCGCCCACCGCGCCGT 1750 
A V A Q T L A R R T H F A H R A V 
GCTGCTCGGTGACACCGTCATCACCACACCCCCCGCGGACCGGCCCGACG 1800 

LLGD TVI TTPP ADRPD 
AACTCGTCTTCGTCTACTCCGGCCAGGGCACCCAGCATCCCGCGATGGGC 1850 
35 ELVFVYSGQG.TQHPAMG 

GAGCAGCTAGCCGATTCGTCGGTGGTGTTCGCCGAGCGGATGGCCGAGTG 1900 

EQLADS SVVFAERMAEC 
TGCGGCGGCGTTGCGCGAGTTCGTGGACTGGGATCTGTTCACGGTTCTGG 1 950 
AAALREFVDWDLFTVL 
40 ATGATCCGGCGGTGGTGGACCGGGTTGATGTGGTCCAGCCCGCTTCCTGG 2000 
DD PA VVD RVDVVQPASW 
GCGATGATGGTTTCCCTGGCCGCGGTGTGGCAGGCGGCCGGTGTGCGGCC 2050 

AMMVS L AA VWQAAG V R P 
GGATGCGGTGATCGGCCATTCGCAGGGTGAGATCGCCGCAGCTTGTGTGG 2100 
45 DAVI GHSQGEIAAACV 

CGGGTGCGGTGTCACTACGCGATGCCGCCCGGATCGTGACCTTGCGCAGC 2150 
AG AVS L R DAAR I V .T LRS 
CAGGCGATCGCCCGGGGCCTGGCGGGCCGGGGCGCGATGGCATCCGTCGC 2200 
Q A I A R G L A G R G A M A S V A 
50 CCTGCCCGCGCAGGATGTCGAGCTGGTCGACGGGGCCTGGATCGCCGCCC 2250 
LPAQDVELVDGAWIAA 
ACAACGGGCCCGCCTCCACCGTGATCGCGGGCACCCCGGAAGCGGTCGAC 2300 
HNGPASTVIAGTPEAVD 
CATGTCCTCACCGCTCATGAGGCACAAGGGGTGCGGGTGCGGCGGATCAC 2350 
55 HVLTAHEAQGVRVRRIT 

CGTCGACTATGCCTCGCACACCCCGCACGTCGAGCTGATCCGCGACGAAC 2400 

VDYASHTPHVELI RDE 
TACTCGACATCACT AGCGACAGCAGCTCGCAGACCCCGCTCGTGCCGTGG 2450 
LLDITSDSSSQTPLVPW 
60 CTGTCGACCGTGGACGGCACCTGGGTCGACAGCCCGCTGGACGGGGAGTA 2500 
LSTVDGTWVDSPLDGEY 
CTGGTACCGGAACCTGCGTGAACCGGTCGGTTTCCACCCCGCCGTCAGCC 2550 

WYRNLRE PVGFHPAVS 
AGTTGCAGGCCCAGGGCGACACCGTGTTCGTCGAGGTCAGCGCCAGCCCG 2600 
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QLQAQG-DTVFVEVSASP 
GTGTTGTTGCAGGCGATGGACGACGATGTCGTCACGGTTGCCACGCTGCG 2 650 

VLLQAMDDDVVTVAT LR 
TCGTGACGACGGCGACGCCACCCGGATGCTCACCGCCCTGGCACAGGCCT 2700 
5 RDDGDATRMLTALAQA 

ATGTCCACGGCGTCACCGTCGACTGGCCCGCCATCCTCGGCACCACCACA 2750 
YVHGVTVDWP AI LGTT T 
ACCCGGGTACTGGACCTTCCGACCTACGCCTTCCAACACCAGCGGTACTG 2800 
T R V L D L P T Y A F Q H Q R Y W 
1 0 GCTCGAGTCGGCACGCCCGGCCGCATCCGACGCGGGCCACCCCGTGCTGG 2850 
LE SARPA ASDAGHP VL 
GCTCCGGTATCGCCCTCGCCGGGTCGCCGGGCCGGGTGTTCACGGGTTCC 2900 
GSG IALAGS PGRVFT GS 
GTGCCGACCGGTGCGGACCGCGCGGTGTTCGTCGCCGAGCTGGCGCTGGC 2950 
15 VPTGADRAV FVAELALA 

CGCCGCGGACGCGGTCGACTGCGCCACGGTCGAGCGGCTCGACATCGCCT 3000 

A A D A V D C A T V E R L D .I A, 
CCGTGCCCGGCCGGCCGGGCCATGGCCGGACGACCGTACAGACCTGGGTC 3050 
SVPGRPGHGRTTVQTWV 
20 GACGAGCCGGCGGACGACGGCCGGCGCCGGTTCACCGTGCACACCCGCAC . 3100 
DEPADDGRRRFTV HTRT 
CGGCGACGCCCCGTGGACGCTGCACGCCGAGGGGGTGCTGCGCCCCCATG 3150 

GDAPWTLHAEGVLRPH. 
GCACGGCCCTGCCCGATGCGGCCGACGCCGAGTGGCCCCCACCGGGCGCG 3200 
25 G T A L P DAADAEWPP PG A 

GTGCCCGCGGACGGGCTGCCGGGTGTGTGGCGCCGGGGGGACCAGGTCTT 3250 

VPADGLPGVWRRGD.QVF 
CGCCGAGGCCGAGGTGGACGGACCGGACGGTTTCGTGGTGCACCCCGACC 3300 
AEAE V DG P D G FVVH P D 
30 TGCTCGACGCGGTCTTCTCCGCGGTCGGCGACGGAAGCCGCCAGCCGGCC 3350 
LL D A V F SAV G DG S RQ P A 
GGATGGCGCGACCTGACGGTGCACGCGTCGGACGCCACCGTACTGCGCGC 3400 

G W R D L T V H A S D A T V L R A 
CTGCCTCACCCGGCGCACCGACGGAGCCATGGGATTCGCCGCCTTCGACG 34 50 
35 CLTRRTDGAMGFAAFD 

GCGCCGGCCTGCCGGTACTCACCGCGGAGGCGGTGACGCTGCGGGAGGTG 3500 
GAGLPVLTAEAVTLREV 
GCGTCACCGTCCGGCTCCGAGGAGTCGGACGGCCTGCACCGGTTGGAGTG 3550 
ASPSGSEES DGLHRLEW 
40 GCTCGCGGTCGCCGAGGCGGTCTACGACGGTGACCTGCCCGAGGGACATG 3600 
LAVAEAVYDGDLPEGH 
TCCTGATCACCGCCGCCCACCCCGACGACCCCGAGGACATACCCACCCGC 3650 
VLITAAHPDDPE DI PTR 
GCCCACACCCGCGCCACCCGCGTCCTGACCGCCCTGCAACACCACCTCAC 3700 
45 AH T RA T R V L T A L Q H H L T 

CACCACCGACCACACCCTCATCGTCCACACCACCACCGACCCCGCCGGCG 3750 

T T D H T L I V H T T T D ,P A G 
CCACCGTCACCGGCCTCACCCGCACCGCCCAGAACGAACACCCCCACCGC 3800 
A T V T G L T R T A Q N E H P H R 
50 ATCCGCCTCATCGAAACCGACCACCCCCACACCCCCCTCCCCCTGGCCCA 3850 
I R L I E T D H P H T P L P L A'Q 
ACTCGCCACCCTCGACCACCCCCACCTCCGCCTCACCCACCACACCCTCC 3900 

L A T L D H P H L R L T H H T L 
ACCACCCCCACCTCACCCCCCTCCACACCACCACCCCACCCACCACCACC 3950 
55 HHPHLTPL HT T TP PTTT 

CCCCTCAACCCCGAACACGCCATCATCATCACCGGCGGCTCCGGCACCCT 4000 

PLNPEHAII ITGGS, GTIi 
CGCCGGCATCCTCGCCCGCCACCTGAACCACCCCCACACCTACCTCCTCT 4 050 
AG I LA RH L N H P H T Y LL 
60 CCCGCACCCCACCCCCCGACGCCACCCCCGGCACCCACCTCGCCTGCGAC 4100 
SRTPPPD. -ATPGTHL-PCD 
GTCGGCGACCCCCACCAACTCGCCACCACCCTCACCCACATCCCCCAACC 4150 

VGDPHQLATTLTHIPQP 
CCTCACCGCCATCTTCCACACCGCCGCCACCCTCGACGACGGCATCCTCC 4200 
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LTAIF HTAATLDDGIL 
ACGCCCTCACCCCCGAGGGCGTCAGCACCGTCCTCCACCCCAAAGCCAAC 4250 
HALT PDRLTTVLH PKAN 
GCCGCCTGGCACCTGCACCACCTCACCCAAAACCAACCCCTCACCCACTT 4 300 

AAWHLHHLTQNQPLTHF 
CGTCCTCT ACTGCAGCGCCGCCGCCGTCCTCGGCAGCCCCGGACAAGGAA 4 350 

VL Y S SAAAVLGS P GQG 
ACTACGCCGCCGCCAACGCCTTCCTCGACGCCCTCGCCACCCACCGCCAC 4 400 
NYAAANA FLDALATHRH 
ACCCTCGGCCAACCCGCCACCTCCATCGCCTGGGGCATGTGGCACACCAC 4 450 

T T,G Q PA T S I AW GMW H T T 
CAGCACCCTCACCGGACAACTCGACGACGCCGACCGGGACCGCATCCGCC 4 500 

S TLTGQLDDADRDRIR 
GCGGCGGTTTCCTCCCGATCACGGACGACGAGGGCATGGGGATGCAT 
RGGFLPI : TDDEG 

Phage KC5 15 DNA was prepared using the procedure described in Genetic 
Manipulation of Streptomyces, A Laboratory Manual, edited by D. Hopwood et al. A 
phage suspension prepared from 10 plates (100 mm) of confluent plaques of KC515 on 
S. lividans TK24 generally gave about 3 \xg of phage DNA. The DNA was ligated to 
circularize at the cos site, subsequently digested with restriction enzymes BamHI and 
PstI, and dephosphorylated with SAP. 

Each module 8 cassette described above was excised with restriction enzymes 
Bglll and Nsil and ligated into the compatible BamHI and PstI sites of KC515 phage 
DNA prepared as described above. The ligation mixture containing KC515 and various 
cassettes was transfected into protoplasts of Streptomyces lividans TK24 using the 
procedure described in Genetic Manipulation of Streptomyces, A Laboratory Manual 
edited by D. Hopwood et al. and overlaid with TK24 spores. After 16-24 hr, the plaques 
were restreaked on plates overlaid with TK24 spores. Single plaques were picked and 
resuspended in 200 \iL of nutrient broth. Phage DNA was prepared by the boiling 
method (Hopwood et al., supra). The PCR with primers spanning the left and right 
boundaries of the recombinant phage was used to verify the correct phage had been 
isolated. In most cases, at least 80% of the plaques contained the expected insert. To 
confirm the presence of the resistance marker (thiostrepton), a spot test is used, as 
described in Lomovskaya et al. (1997), in which a plate with spots of phage is overlaid 
with mixture of spores of TK24 and phiC31 TK24 lysogen. After overnight incubation, 
the plate is overlaid with antibiotic in soft agar. A working stock is made of all phage 
containing desired constructs. 

Streptomyces hygroscopicus ATCC 14891 (see US Patent No. 3,244,592, issued 
5 Apr 1966, incorporated herein by reference) mycelia were infected with the 
recombinant phage by mixing the spores and phage (1 x 10 8 of each), and incubating on 
R2YE agar (Genetic Manipulation of Streptomyces, A Laboratory Manual, edited by D. 
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Hopwood et al.) at 30°C for 10 days. Recombinant clones were selected and plated on 
minimal medium containing thiostrepton (50 ^ig/ml) to select for the thiostrepton 
resistance-conferring gene. Primary thiostrepton resistant clones were isolated and 
purified through a second round of single colony isolation, as necessary. To obtain 
thiostrepton-sensitive revertants that underwent a second recombination event to evict 
the phage genome, primary recombinants were propagated in liquid media for two to 
three days in the absence of thiostrepton and then spread on agar medium without 
thiostrepton to obtain spores. Spores were plated to obtain about 50 colonies per plate, 
and thiostrepton sensitive colonies were identified by replica plating onto thiostrepton 
containing agar medium. The PCR was used to determine which of the thiostrepton 
sensitive colonies reverted to the wild type (reversal of the initial integration event), and 
which contain the desired AT swap at module 8 in the ATCC 14891 -derived cells. The 
PCR primers used amplified either the KS/AT junction or the AT/DH junction of the 
wild-type and the desired recombinant strains. Fermentation of the recombinant strains, 
followed by isolation of the metabolites and analysis by LCMS, and NMR is used to 
characterize the novel polyketide compounds. 

Example 2 

Replacement of Methoxyl with Hydrogen or Methyl at C-13 of FK-506 
The present invention also provides the 13-desmethoxy derivatives of FK-506 
and the novel PKS enzymes that produce them. A variety of Streptomyces strains that 
produce FK-506 are known in the art, including S. tsukubaensis No. 9993 (FERM BP- 
927), described in U.S. Patent No. 5,624,852, incorporated herein by reference; 5. 
hygroscopicus subsp. yakushimaensis No. 7238, described in U.S. patent No. 4,894,366, 
incorporated herein by reference; S. sp. MA6858 (ATCC 55098), described in U.S. 
Patent Nos. 5,1 16,756, incorporated herein by reference; and S. sp. MA 6548, described 
in Motamedi et aL, 1998, "The biosynthetic gene cluster for the macrolactone ring of the 
immunosuppressant FK-506," Eur. J. Biochem. 256: 528-534, and Motamedi et al. 9 
1997, "Structural organization of a multifunctional polyketide synthase involved in the 
biosynthesis of the macrolide immunosuppressant FK-506/* Eur. J. Biochem. 244:74- 
80, each of which is incorporated herein by reference. 

The complete sequence of the FK-506 gene cluster from Streptomyces sp. 
MA6548 is known, and the sequences of the corresponding gene clusters from other FK- 
506-producing organisms is highly homologous thereto. The novel FK-506 recombinant 
gene clusters of the present invention differ from the naturally occurring gene clusters in 
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that the AT domain of module 8 of the naturally occurring PKSs is replaced by an AT 
domain specific for malonyl CoA or methylmalonyl CoA. These AT domain 
replacements are made at the DNA level, following the methodology described in 
Example L 

5 The naturally occurring module 8 sequence for the MA6548 strain is shown 

below, followed by the illustrative hybrid module 8 sequences for the MA6548 strains. 

GCATGCGGCTGTACGAGGCGGCACGGCGCACCGGAAGTCCCGTGGTGGTG 5 0 
MRLYEAARRTGSPV VV 
GCGGCCGCGCTCGACGACGCGCCGGACGTGCCGCTGCTGCGCGGGCTGCG 100 
10 AAA L D D A P D V P L L R G L R 

GCGTACGACCGTCCGGCGTGCCGCCGTCCGGGAACGCTCTCTCGCCGACC 15 0 

RTTVR-RAAVRERS LAD 
GCTCGCCGTGCTGCCCGACGACGAGCGCGCCGACGCCTCCCTCGCGTTCG 200 
RSPCCPTTSAPTPPSRS 

15 

TCCTGGAACAGCACCGCCACCGTGCTCGGCCACCTGGGCGCCGAAGACAT 250 

SWNSTAT VLGHLGAEDI. 
CCCGGCGACGACGACGTTCAAGGAACTCGGCATCGACTCGCTCACCGCGG 300 

PATTTFKELGI DSLTA 
TCCAGCTGCGCAACGCGCTGACCACGGCGACCGGCGTACGCCTCAACGCC 350 
20 VQLRNAL TT ATGVRLNA 

ACAGCGGTCTTCGACTTTCCGACGCCGCGCGCGCTCGCCGCGAGACTCGG 4 00 

TAVFDFPT PRALAARLG 
CGACGAGCTGGCCGGTACCCGCGCGCCCGTCGCGGCCCGGACCGCGGCCA 450 
DELAGTRAPVAARTAA 
25 CCGCGGCCGCGCACGACGAACCGCTGGCGATCGTGGGCATGGCCTGCCGT 500 

TAA AH DEPL A I VGMACR 
CTGCCGGGCGGGGTCGCGTCGCCACAGGAGCTGTGGCGTCTCGTCGCGTC 550 

LPGGVASPQEL W R L V A S 
CGGCACCGACGCCATCACGGAGTTCCCCGCGGACCGCGGCTGGGACGTGG 600 
30 GTDAI TEFPADRGWDV 

ACGCGCTCTACGACCCGGACCCCGACGCGATCGGCAAGACCTTCGTCCGG 650 
DALYDPDPDAIGKTFVR 
CACGGCGGCTTCCTCGACGGTGCGACCGGCTTCGACGCGGCGTTCTTCGG 700 
HGGFLDGATGFDAAFFG 

35 

GATCAGCCCGCGCG AGGCCCTGGCCATGGACCCGCAGCAACGGGTGCTCC 750 

I S P REAL A MDPQQRVL 
TGGAGACGTCCTGGGAGGCGTTCGAAAGCGCGGGCATCACCCCGGACGCG 800 
LETSWE AFESAG I TPDA 
GCGCGGGGCAGCGACACCGGCGTGTTCATCGGCGCGTTCTCCTACGGGTA 850 
40 ARGS DTGV F IGAFSYGY 

CGGCACGGGTGCGGATACCAACGGCTTCGGCGCGACAGGGTCGCAGACCA 900 

GTGAD TNGFGATG SQT 
GCGTGCTCTCCGGCCGCCTCTCGTACTTCTACGGTCTGGAGGGCCCTTCG 950 
SVLSG RLSYFY GL E GPS 
45 GTCACGGTCGACACCGCCTGCTCGTCGTCACTGGTCGCCCTGCACCAGGC 1000 

VTVDTACSSSLVALHQA 
AGGGCAGTCCCTGCGCTCGGGCGAATGCTCGCTCGCCCTGGTCGGCGGTG 1050 

GQSLRSGECSLALVGG 
TCACGGTGATGGCGTCGCCCGGCGG ATTCGTCGAGTTCTCCCGGCAGCGC 1100 
50 VTVMASPGGFVEFSRQR 

GGGCTCGCGCCGGACGGGCGGGCGAAGGCGTTCGGCGCGGGCGCGGACGG 1150 * " * * 

GLAP DGRA KAFGAGADG 
TACGAGCTTCGCCGAGGGCGCCGGTGCCCTGGTGGTCGAGCGGCTCTCCG 1200 
TSFAEGAGALVVERLS 
55 ACGCGGAGCGCCACGGCCACACCGTCCTCGCCCTCGTACGCGGCTCCGCG 1250 

D A E R H G H T V L A " L V R G S A 
GCTAACTCCGACGGCGCGTCGAACGGTCTGTCGGCGCCGAACGGCCCCTC 1300 

A N S D G A S N G L S .A P N.G P S_ 
CCAGGAACGCGTCATCCACCAGGCCCTCGCGAACGCGAAACTCACCCCCG 1350 
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QE RV I HQALANAKLT P 
CCGATGTCGACGCGGTCGAGGCGCACGGCACCGGCACCCGCCTCGGCGAC .1400 
ADVDAVEAHGTGTRLGD 
CCCATCGAGGCGCAGGCGCTGCTCGCGACGTACGGACAGGACCGGGCGAC 14 50 
5 PIEAQALLATYGQDRAT 

GCCCCTGCTGCTCGGCTCGCTGAAGTCGAACATCGGGCACGCCCAGGCCG 1500 

PLLLGSLKSNIGHAQA 
CGTCAGGGGTCGCCGGGATCATCAAGATGGTGCAGGCCATCCGGCACGGG 1550 
ASGVAGI I KMVQAIRHG 
10 GAACTGCCGCCGACACTGCACGCGGACGAGCCGTCGCCGCACGTCGACTG 1600 

E LPPTLHA DE PSPH VDW 
GACGGCCGGTGCCGTCGAGCTCCTGACGTCGGCCCGGCCGTGGCCGGGGA 1650 

TAGAVELLTSARPW PG 
CCGGTCGCCCGCGCCGCGCTGCCGTCTCGTCGTTCGGCGTGAGCGGCACG 1700 
15 T G R P R R A AVSSFGVSGT 

AACGCCCACATCATCCTt GAGGCAGGACCGGTCAAAACGGGACCGGTCGA 1750 

NAHI IL EAGPVKTG PV E 
GGCAGGAGCGATCGAGGCAGGACCGGTCGAAGTAGGACCGGTCGAGGCTG 1800 
AGAIEAGPVEVGPVEA 
20 GACCGCTCCCCGCGGCGCCGCCGTCAGCACCGGGCGAAGACCTTCCGCTG 1850 

GPLPAAPPSAPG EDLPL 
CTCGTGTCGGCGCGTTCCCCGGAGGCACTCGACGAGCAGATCGGGCGCCT 1 900 

LVSARSPEALDEQ I G R L 
GCGCGCCTATCTCGACACCGGCCCGGGCGTCGACCGGGCGGCCGTGGCGC 1950 
25 RAYL DTGPGVDRAAVA 

AGACACTGGCCCGGCGTACGCACTTCACCCACCGGGCCGTACTGCTCGGG 2000 
QT LA RR T H FT H RAVLL G 
GACACCGTCATCGGCGCTCCCCCCGCGGACCAGGCCGACGAACTCGTCTT 2050 
DTV I GAP PADQAD ELV F 
30 CGTCTACTCCGGTCAGGGCACCCAGCATCCCGCGATGGGCGAGCAACTCG 2100 

VYSGQGTQHPAMGEQL 
CGGCCGCGTTCCCCGTGTTCGCCGATGCCTGGCACGACGCGCTCCGACGG 2150 
AAA FP VF AD AWHD A L R R 
CTCGACGACCCCGACCCGCACGACCCCACACGGAGCCAGCACACGCTCTT 2200 
35 LDDPDPHDPTRSQHTLF 

CGCCCACCAGGCGGCGTTCACCGCCCTCCTGAGGTCCTGGGACATCACGC 2250 * - 

AHQAAFTALL RS WDIT 
CGCACGCCGTCATCGGCCACTCGCTCGGCGAGATCACCGCCGCGTACGCC 2300 
PHAVIGHSLGEITAAYA 
40 GCCGGGATCCTGTCGCTCGACGACGCCTGCACCCTGATCACCACGCGTGC 2350 

A G I L S L D DAC T L I T T R A 
CCGCCTC ATGCACACGCTTCCGCCGCCCGGCGCCATGGTC ACCGTGCTGA 2400 

RLMH TL PPPG AMVTVL 
CCAGCGAGGAGGAGGCCCGTCAGGCGCTGCGGCCGGGCGTGGAGATCGCC 2450 
45 T S E E EA 'RQALR PG VE I A 

GCGGTCTTCGGCCCGCACTCCGTCGTGCTCTCGGGCGACGAGGACGCCGT 2500 

A V F G P H S V V L S G D E D A V 
GCTCGACGTCGCACAGCGGCTCGGCATCCACCACCGTCTGCCCGCGCCGC 2550 
L D V A Q R L G I H H R L P A P 
50 ACGCGGGCCACTCCGCGCACATGGAACCCGTGGCCGCCGAGCTGCTCGCC 2600 . 

HAGHSAHMEPVAAELLA 
ACCACTCGCGAGCTCCGTTACGACCGGCCCCACACCGCCATCCCGAACGA 2650 

TTRE LRYDRPHT AI PND 
CCCCACCACCGCCGAGTACTGGGCCGAGCAGGTCCGCAACCCCGTGCTGT 2700 
55 PTTAEYWAEQ VR NPVL 

TCCACGCCCACACCCAGCGGTACCCCGACGCCGTGTTCGTCGAGATCGGC 2750 

F HAH T Q R Y P DAV FVE'I G " 
CCCGGCCAGGACCTCTCACCGCTGGTCGACGGCATCGCCCTGCAGAACGG 2800 - 
P G Q D L S PLV DG IALQNG 
60 CACGGCGGACGAGGTGCACGCGCTGCACACCGCGCTCGCCCGCCTCTTCA 2850 : 

TADEVH ALHT A L ARLF - - 

CACGCGGCGCCACGCTCGACTGGTCCCGCATCCTCGGCGGTGCTTCGCGG 2900 
T R G A T - L D W S R I LG GAS R 
CACGACCCTGACGTCCCCTCGTACGCGTTCCAGCGGCGTCCCTACTGGAT 2950 
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HDPDVPSYAFQRRPYW I 
CGAGTCGGCTGGGGGGGGGACGGCCGACTCGGGCCACCCCGTCCTCGGCA 3000 

E SA P PATA DSGH PV L G 
CCGGAGTCGCCGTCGCCGGGTCGCCGGGCCGGGTGTTCACGGGTCCCGTG 3050 
5 TGVAVAGS PGRV FTG PV 

CCCGCCGGTGCGGACCGCGCGGTG TTCATCGCCGAACTGGCGCTCGCCGC 3100 

PAGADRAV FIAELALAA 
CGCCGACGCCACCGACTGCGCCACGGTCGAACAGCTCGACGTCACCTCCG 3150 
A D A T DC A T VEQL DVT S 
10 TGCCCGGCGGATCCGCCCGCGGCAGGGCCACCGCGCAGACCTGGGTCGAT 3200 

V PG GSAR GRATAQTWVD 
GAACCCGCCGCCGACGGGCGGCGCCGCTTCACCGTCCACACCCGCGTCGG 3250 

E PAADGRRRFTVHTRVG 
CGACGCCCCGTGGACGCTGCACGCCGAGGGGGTTCTCCGCCCCGGCCGCG 3300 
15 DAPWTL HAEG VLRPGR 

TGCCCCAGCCCGAAGCCGTCGACACCGCCTGGCCCCCGCCGGGCGCGGTG 3350 
VPQPEAVDTAWPPPGAV 
CCCGCGGACGGGCTGCCCGGGGCGTGGCGACGCGCGGACCAGGTCTTCGT 3400 
PADGLPGAWRRADQVFV 
20 CGAAGCCGAAGTCGACAGCCCTGACGGCTTCGTGGCACACCCCGACCTGC 3450 

EAEVDS PDGFVAHPDL 
TCGACGCGGTCTTCTCCGCGGTCGGCGACGGGAGCCGCCAGCCGACCGGA 3500 
LDAVFSAVGDGSRQPTG 
TGGCGCGACCTCGCGGTGCACGCGTCGGACGCCACCGTGCTGCGCGCCTG 3550 
25 WRDLAVHASDATVLRAC 

CCTCACCCGCCGCGACAGTGGTGTCGTGGAGCTCGCCGCCTTCGACGGTG 3600 

LTRRDSGVVELAAFDG 
CCGGAATGCCGGTGCTCACCGCGGAGTCGGTGACGCTGGGCGAGGTCGCG 3650 
AGMPVLTAESVTLGEVA 
30 TCGGCAGGCGGATCCGACGAGTCGGACGGTCTGCTTCGGCTTGAGTGGTT 3700 

SAGGSDESDGL.LRLEWL. 
GCCGGTGGCGGAGGCCCACTACGACGGTGCCGACGAGCTGCCCGAGGGCT 3750 

P V A E A H Y D G A D E L P E G 
ACACCCTCATCACCGCCACACACCCCG ACGACCCCGACGACCCCACCAAC 3800 
35 YTLITATHPDDPDDPTN 

CCCCACAACACACCCACACGCACCCACACACAAACCACACGCGTCCTCAC 3850 

PHNTPTRTHTQTTRVLT 
CGCCCTCCAACACCACCTCATCACCACCAACCACACCCTCATCGTCCACA 3900 
ALQHHLI TTNHTLIVH 
40 CCACCACCGACCCCCCAGGCGCCGCCGTCACCGGCCTCACCCGCACCGCA 3950 

TTTDPP.GAAVTGLTRTA 
CAAAACGAACACCCCGGCCGCATCCACCTCATCGAAACCCACCACCCCCA 4000 

QNEHPGR I .HLIETHHPH 
CACCCC ACTCCCCCTCACCCAACTCACCACCCTCCACCAACCCCACCTAC 4 050 
45 TPLPLTQL TTLHQPHL 

GCCTCACCAACAACACCCTCCACACCCCCCACCTCACCCCCATCACCACC 4100 
RLTNNT.IiHT PHLT P I TT 
CACCACAACACCACCACAACCACCCCCAACACCCCACCCCTCAACCCCAA 4150 
H H N T T T T T P N T P P. L N P N 
50 CCACGCCATCCTCATCACCGGCGGCTCCGGCACCCTCGCCGGCATCCTCG 4200 

HAILITGGSGTLAGIL 
CCCGCCACCTCAACCACCCCCACACCTACCTCCTCTCCCGCACACCACCA 4250 
ARHLNHPHTYLLSRTPP 
CCCCCCACCACACCCGGCACCCACATCCCCTGCGACCTCACCGACCCCAC 4300 
55 PPTTPGTHIPCDLTDPT 

CCAAATCACCCAAGCCCTCACCCACAT ACCACAACCCCTCACCGGCATCT 4350 

QI TQALT HI P Q PLTGI ^ 
TCCACACCGCCGCCACCCTCGACGACGCCACCCTCACCAACCTCACCCCC 4400 
FH TAAT L D DATLTN L T P 
60 CAACACCTCACCACCACCCTCCAACCCAAAGCCGACGCCGCCTGGCACCT 4450 

QHLTTTLQPKADA A W H L 
CCACCACCACACCCAAAACCAACCCCTCACCCACTTCGTCCTCTACTCCA 4 500 

HHHTQNQPLTHFVLYS 
GCGCCGCCGCCACCCTCGGCAGCCCCGGCCAAGCCAACTACGCCGCCGCC 4550 
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SAAATLGSPGQANYAAA 
AACGCCTTCCTCGACGCCCTCGCCACCCACCGCCACACCCAAGGACAACC 4 600 

NAFL DALATHRHTQGQP 
CGCCACCACCATCGCCTGGGGCATGTGGCACACCACCACCACACTCACCA 4 650 
5 ATT IAWGMWHTTTTLT 

GCCAACTCACCGACAGCGACCGCGACCGCATCCGCCGCGGCGGCTTCCTG 4 700 
SQLTDSDRDRIRR GGFL 
CCGATCTCGGACGACGAGGGCATGC 

PISDDEG M 

10 

The Avrll-Xhol hybrid FK-506 PKS module 8 containing the AT domain of 
module 12 of rapamycin is shown below. 

GCATGCGGCTGTACGAGGCGGCACGGCGCACCGGAAGTCCCGTGGTGGTG 50 
MRLYEA ARRTGSPVVV 
15 GCGGCCGCGCTCGACGACGCGCCGGACGTGCCGCTGCTGCGCGGGCTGCG 100 
AAA LD DAP DVPL LRGLR. 
GCGTACGACCGTCCGGCGTGCCGCCGTCCGGGAACGCTCTCTCGCCGACC 150 

RTTV.RRA AVRERSL AD 
GCTCGCCGTGCTGCCCGACGACGAGCGCGCCGACGCCTCCCTCGCGTTCG 200 
20 RSPCCPTTSAPTPPSRS 

TCCTGGAACAGCACCGCCACCGTGCTCGGCCACCTGGGCGCCGAAGACAT 250 

SWNSTATVLGHLGAEDI 
CCCGGCGACGACGACGTTCAAGGAACTCGGCATCGACTCGCTCACCGCGG 300 
PATTTFKELGI DSLTA 
25 TCCAGCTGCGCAACGCGCTG ACCACGGCG ACCGGCGTACGCCTCAACGCC 350 
VQLRNALTTATGVRLNA 
ACAGCGGTCTTCGACTTTCCGACGCCGCGCGCGCTCGCCGCGAGACTCGG 400 

TAV FDFPT PRALAARLG 
CGACGAGCTGGCCGGTACCCGCGCGCCCGTCGCGGCCCGGACCGCGGCCA 450 
30 DE LA G T RA P V AARTAA 

CCGCGGCCGCGCACGACGAACCGCTGGCGATCGTGGGCATGGCCTGCCGT 500 
TAAAH DEPLA IVGMACR 
CTGCCGGGCGGGGTCGCGTCGCCACAGGAGCTGTGGCGTCTCGTCGCGTC 550 
LPGGVA S PQELWRLVA S 
35 CGGCACCGACGCCATCACGGAGTTCCGCGCGGACCGCGGCTGGGACGTGG 600 
GTDAITEFPADRGWDV 
ACGCGCTCTACGACCCGGACCCCGACGCGATCGGCAAGACCTTCGTCCGG 650 
DAL YDPDPDAIGKTFVR 
CACGGCGGCTTCCTCGACGGTGCGACCGGCTTCGACGCGGCGTTCTTCGG 700 
40 HGG FLDGATGFDAA FFG 

GATCAGCCCGCGCGAGGCCCTGGCCATGGACCCGCAGCAACGGGTGCTCC 750 

I S P REALAM d pqqrvl 
TGGAGACGTCCTGGGAGGCGTTCGAAAGCGCGGGCATCACCCCGGACGCG 800 
LETSWE AFESAGITPDA 
45 GCGCGGGGCAGCGACACCGGCGTGTTCATCGGCGCGTTCTCCTACGGGTA 850 
A R G S D T G V FIG A F S Y G Y , 
CGGCACGGGTGGGGATACCAACGGCTTCGGCGCGACAGGGTCGCAGACCA 900 

G T G A D T N G F G A T G S Q T 
GCGTGCTCTCCGGCCGCCTCTCGTACTTCTACGGTCTGGAGGGCCCTTCG 950 
50 SVLSGRLS YFYGLEGP S 

GTCACGGTCGACACCGCCTGCTCGTCGTCACTGGTCGCCCTGCACCAGGC 100 0 

VTVDTACS S S LV ALHQA 
AGGGCAGTCCCTGCGCTCGGGCGAATGCTCGCTCGCCCTGGTCGGCGGTG 1050 

G Q S L R S G E C S L A L V G G - - - - - 

55 TCACGGTGATGGCGTCGCCCGGCGGATTCGTCGAGTTCTCCCGGCAGCGC 1100 
VTVMASP GGF VEFSRQ R 
GGGCTCGCGCCGGACGGGCGGGCGAAGGCGTTCGGCGCGGGCGCGGACGG 1150 

GLA P DG RAKA F GA GAD G 
TACGAGCTTCGCCGAGGGCGCCGGTGCCCTGGTGGTCGAGCGGCTCTCCG .1200 . _ . 
60 TSFAEGAGALVVERLS 

ACGCGGAGCGCCACGGCCACACCGTCCTCGCCCTCGTACGCGGCTCCGCG 1 2 50 
DAERHGHT VLALVRGSA 
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gctaactccgacggcgcgtcgaacggtctgtcggcgccgaacggcccctc 1300 

a n s d g a s n g l s a p n g p s 
ccaggaacgcgtcatccaccaggccctcgcgaacgcgaaactcacccccg 1350 
q e r v i hqalanaklt p 
5 ccgatgtcgacgcggtcgaggcgcacggcaccggcacccgcctcggcgac 1400 
advdaveahgtgtrlgd 
cccatcgaggcgcaggcgctgctcgcgacgtacggacaggaccgggcgac 1450 

pieaqallatygqdrat 
gcccctgctgctcggctcgctgaagtcgaacatcgggcacgcccaggccg 1500 
10 pll lgslksn i ghaqa 

cgtcaggggtcgccgggatcatcaagatggtgcaggccatccggcacggg 1550 
asgvagi ikm vqairhg 
gaactgccgccgacactgcacgcggacgagccgtcgccgcacgtcgactg 1 600 
el p ptlhade.ps phvdw 
15 gacggccggtgccgtcgagctcctgacgtcggcccggccgtggccgggga 1650 
t agave'lltsarpwpg 
ccggtcgccctaggcgggcaggcgtgtcgtccttcgggatcagtggcacc 1700 
tgrprragvssfgi. sgt 
aacgcccacgtcatcctggaaagcgcaccccccactcagcctgcggacaa 1750 
20 nahvi lesapptqpadn 

cgcggtgatcgagcgggcaccggagtgggtgccgttggtgatttcggcca 1800 

avierapewvplvisa 
gg acccagtcggctttgactgagcacgagggccggttgcgtgcgtatctg 1850 
rtqsaltehegrlrayl 
25 gcggcgtcgcccggggtggatatgcgggctgtggcatcgacgctggcgat 1900 
aaspgvd mravastlam 
gacacggtcggtgttcgagcaccgtgccgtgctgctgggagatgacaccg 1950 

trsvfehravl. lgddt 
tcaccggcaccgctgtgtctgaccctcgggcggtgttcgtcttcccggga 2000 
30 vt gtavsdpravfvfpg 

caggggtcgcagcgtgctggcatgggtgaggaactggccgccgcgttccc 2050 

qgs qr agm geelaaaf p 
cgtcttcgcgcggatccatcagcaggtgtgggacctgctcgatgtgcccg 2100 
v fa r i h q qvwdll dv p 
35 atctggaggtgaacgagaccggttacgcccagccggccctgttcgcaatg 2150 
dlevnetgyaqpalfam 
caggtggctctgttcgggctgctggaatcgtggggtgtacgaccggacgc 2200 
qvalfglleswgvrpda 
ggtgatcggccattgggtgggtgagcttgcggctgcgtatgtgtccgggg 2250 
40 v i gh svgelaaayv s g 

tgtggtcgttggaggatgcctgcactttggtgtcggcgcgggctcgtctg 2300 
vwsledactlvsararl 
atgcaggctctgcccgcgggtggggtgatggtcgctgtcccggtctcgga 2350 
mqalpaggvm vavpvse 
45 ggatgaggcccgggccgtgctgggtgagggtgtggagatcgccgcggtca 2400 
dearavlgegveiaav 
acggcccgtcgtcggtggtxctctccggtgatgaggccgccgtgctgcag 24 50 
n g p s s v v l s g d e a a v l q 
gccgcggaggggctggggaagtggacgcggctggcgaccagccacgcgtt 2500 
50 aaeglgkwtrl atshaf 

ccattccgcccgtatggaacccatgctggaggagttccgggcggtcgccg 2550 

hsarmepmleefrava 
aaggcctgacctaccggacgccgcaggtctccatggccgttggtgatcag 2600 
egltyrtpqvsmavgdq 
55 gtgaccaccgctgagtactgggtgcggcaggtccgggacacggtccggtt 2650 
vttaeywvrqvr dt v rf 
cggcgagcaggtggcctcgtacgaggacgccgtgttcgtcgagctgggtg 2 7 00 

geqvasyedavfvelg 
ccgaccggtcactggcccgcctggtcgacggtgtcgcgatgctgcacggc 2750 
60 adr slarlvdgvaml hg 

gaccacgaaatccaggccgcgatcggcgccctggcccacctgtatgtcaa 2800 - - 

d h e . i . q a a i g ala h l y v n 
cggcgtcacggtcgactggcccgcgctcctgggcgatgctccggcaacac 2850 
gvtvdwpallgdapat 
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GGGTGCTGGACCTTCCGACATACGCCTTCCAGCACCAGCGCTACTGGCTC 2900 
RVLDLPTYAFQHQRYWL 
GAGTCGGCTCCCCCGGCCACGGCCGACTCGGGCCACCCCGTCCTCGGCAC 2950 

ESAP PATADS G H PVLGT 
CGGAGTCGCCGTCGCCGGGTCGCCGGGCCGGGTGTTCACGGGTCCCGTGC 3000 

GVAVAGS PGRVFTGP V 
CCGCCGGTGCGGACCGCGCGGTGTTCATCGCCGAACTGGCGCTCGCCGCC 3050 
P A G ' A D RAV F I A E LAL AA 
GCCGACGCCACCGACTGCGCCACGGTCGAACAGCTCGACGTCACCTCCGT 3100 

A D A T D C A T V E Q L D V T S V 
GCCCGGCGGATCCGCCCGCGGCAGGGCCACCGCGCAGACCTGGGTCGATG 3150 

PGGSARGR A TAQTWVD 
AACCCGCCGCCGACGGGCGGCGCCGCTTCACCGTCCACACCCGCGTCGGC 3200 
EPAAD GRRRFTVHTRVG 
GACGCCCCGTGGACGCTGCACGCCGAGGGGGTTCTCCGCCCCGGCCGCGT 3250 

DAPWTLHAEGVLR PGRV 
GCCCCAGCCCGAAGCCGTCGACACCGCCTGGCCCCCGCCGGGCGCGGTGC 3300 

P Q P E AV D TA W P P P . G A V ' 
CCGCGGACGGGCTGCCCGGGGCGTGGCGACGCGCGGACCAGGTCTTCGTC 3350 
PA D G L P G A W R R A D Q V F V 
GAAGCCGAAGTCGACAGCCCTGACGGCTTCGTGGCACACCCCGACCTGCT 3400 

EA E V D S P DG FVAH P DL L 
CGACGCGGTCTTCTCCGCGGTCGGCGACGGGAGCCGCCAGCCGACCGGAT 3450 

DAV.F S AVG DG S RQ P T G 
GGCGCGACCTCGCGGTGCACGCGTCGGACGCCACCG TGCTGCGCGCCTGC 3500 
WRDLAVHASDATVLRAC 
CTCACCCGCCGCGACAGTGGTGTCGTGGAGCTCGCCGCCTTCGACGGTGC 3550 

LTRRDSGVVELAAFDGA 
CGGAATGCCGGTGCTCACCGCGGAGTCGGTGACGCTGGGCGAGGTCGCGT 3600 

GMPVLTAESVTLGEVA 
CGGCAGGCGGATCCGACGAGTCGGACGGTCTGCTTCGGCTTGAGTGGTTG 3650 
SAGGS DESDGLLRLEW L 
CCGGTGGCGGAGGCCCACTACGACGGTGCCGACGAGCTGCCCGAGGGCTA 3700 

PVAEAHYDGADELPEG Y 
CACCCTC ATC ACCGCCAC ACACCCCG ACG ACCCCGACG ACCCCACC AACC 3750 

TL I TAT H PDDPDD PT N 
CCCACAACACACCCACACGCACCCACACACAAACCACACGCGTCCTCACC 3800 
PHNTPTRTHTQTTR VL T 
GCCCTCCAACACCACCTCATCACCACCAACCACACCCTCATCGTCCACAC 3850 

ALQH H L I TTN H T L I VH T 
CACCACCGACCCCCCAGGCGCCGCCGTCACCGGCCTCACCCGCACCGCAC 3900 

TTDPPG AA VTGLTRT A 
AAAACGAACACCCCGGCCGCATCCACCTCATCGAAACCCACCACCCCCAC 3950 
QNEH PGRI H L I E T HHP H 
ACCCCACTCCCCCTCACCCAACTCACCACCCTCCACCAACCCCACCTACG 4 000 

T P L P L TQLTT L H Q PHL R 
CCTCACCAACAACACCCTCCACACCCCCCACCTCACCCCCATCACCACCC 4 05 0 

L T N N T L H T P H L T PIT T * 
ACCACAACACCACCACAACCACCCCCAACACCCCACCCCTCAACCCCAAC 4100 
HHNTTTTTPNTPPLNPN 
CACGCCATCCTCATCACCGGCGGCTCCGGC ACCCTCGCCGGCATCCTCGC 4150 

H A I L I T G G S G T L A G I L A 
CCGCCACCTCAACCACCCCCACACCTACCTCCTCTCCCGCACACCACCAC 4200 

R H L N H P H T Y L L S R T P P 
CCCCCACCACACCCGGCACCCACATCCCCTGCGACCTCACCGACCCCACC 4 250 

PPTT PGTH I PCDL TDPT 

CAAATCACCCAAGCCCTCACCCACATACCACAACCCCTCACCGGCATCTT 4 300 

Q I TQAL T H I P Q P L T G I F 
CCACACCGCCGCCACCCTCGACGACGCCACCCTCACCAACCTCACCCCCC 4 350 

H T A A T L D D A T L T N L T*P 
AACACCTCACCACCACCCTCCAACCCAAAGCCGACGCCGCCTGGCACCTC 4 400 
QHLTTTLQPKADAAW HL 
CACCACCACACCC AAAACCAACCCCTCACCCACTTCGTCCTCTACTCCAG 4 450 
HHHTQNQPLTH FVLYSS 
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CGCCGCCGCCACCCTCGGCAGCCCCGGCCAAGCCAACTACGCCGCCGCCA 4500 

AAA T LGS PGQANYAAA 
ACGCCTTCCTCGACGCCCTCGCCACCCACCGCCACACCCAAGGACAACCC 4550 
NAFLDALAT HRHTQGQ P 
5 GCCACCACCATCGCCTGGGGCATGTGGCACACCACCACCACACTCACCAG 4 600 
ATT IAWGMWHTTT TLTS 
CCAACTCACCGACAGCGACCGCGACCGCATCCGCCGCGGCGGCTTCCTGC 4 650 

QLTDSDRDR IRRGG FL 
CGATCTCGGACGACGAGGGCATGC 
10 P I S D D E G M 

The AvrlL-Xhol hybrid FK-506 PKS module 8 containing the AT domain of 
module 13 of rapamycin is shown below. 

GCATGCGGCTGTACGAGGCGGCACGGCGCACCGGAAGTCCCGTGGTGGTG 5 0 
15 MRLYEA ARRTGSPVVV 

GCGGCCGCGCTCGACGACGCGCCGGACGTGCCGCTGCTGCGCGGGCTGCG 100 

AAALDDAP D .VPLLRGLR 
GCGTACGACCGTCCGGCGTGCCGCCGTCCGGGAACGCTCTCTCGCCGACC 150 
RTTVRRAAVRERSLAD 
20 GCTCGCCGTGCTGCCCGACGACGAGCGCGCCGACGCCTCCCTCGCGTTCG 200 
RSPCCPTTSA PTPPSRS 
TCCTGGAACAGCACCGCCACCGTGCTCGGCCACCTGGGCGCCGAAGACAT 250 

SWNSTAT VLGHLGAEDI 
CCCGGCGACGACGACGTTCAAGGAACTCGGCATCGACTCGCTCACCGCGG 300 
25 PATTTFKELGIDSLTA 

TCCAGCTGCGCAACGCGCTGACCACGGCGACCGGCGTACGCCTCAACGCC 350 
VQLRNALTTATGVRLNA 
ACAGCGGTCTTCGACTTTCCGACGCCGCGCGCGCTCGCCGCGAGACTCGG 4 00 
T AV FD F P T PRALAARL G 
30 CGACGAGCTGGCCGGTACCCGCGCGCCCGTCGCGGCCCGGACCGCGGCCA 4 50 
DE LAG T RAP VA ARTAA 
CCGCGGCCGCGCACGACGAACCGCTGGCGATCGTGGGCATGGCCTGCCGT 500 
TAAAH DE PLAIVGMACR 
CTGCCGGGCGGGGTCGCGTCGCCACAGGAGCTGTGGCGTCTCGTCGCGTC 550 
35 L D GGVAS PQELWRLVAS 

CGGCACCGACGCCATCACGGAGTTCCCCGCGGACCGCGGCTGGGACGTGG 600 

GTDAI TEFPADRGWDV 
ACGCGCTCTACGACCCGGACCCCGACGCGATCGGCAAGACCTTCGTCCGG 650 
DALYDPDPDAIGKTFVR 
40 C ACGGCGGCTTCCTCGACGGTGCG ACCGGCTTCGACGCGGCGTTCTTCGG 700 
HGG FLDGATGFDAAFFG 
GATCAGCCCGCGCGAGGCCCTGGCCATGGACCCGCAGCAACGGGTGCTCC 750 

I S PREALAMDPQQRVL 
TGGAGACGTCCTGGGAGGCGTTCGAAAGCGCGGGCATCACCCCGGACGCG 800 
45 LETSWEAFESAGITPDA 

GCGCGGGGCAGCGACACCGGCGTGTTCATCGGCGCGTTCTCCTACGGGTA 850 

ARG SDTG VFIGAFS Y GY 
CGGCACGGGTGCGGATACCAACGGCTTCGGCGCGACAGGGTCGCAGACCA 900 
GTGADTNGFGATGSQT 
50 GCGTGCTCTCCGGCCGCCTCTCGTACTTCTACGGTCTGGAGGGCCCTTCG 950 
SVLSGRLSYF YG LEGPS 

GTCACGGTCGACACCGCCTGCTCGTCGTCACTGGTCGCCCTGCACCAGGC .1000 . " . 

VTVDTACSSSLVALHQA 
AGGGCAGTCCCTGCGCTCGGGCGAATGCTCGCTCGCCCTGGTCGGCGGTG 1050 

55 G Q S L R S G E C S L A L V G G 

TCACGGTGATGGCGTCGCCCGGCGGATTCGTCGAGTTCTCCCGGCAGCGC 1100 - 
VTVMAS PGGFVEFSRQ R 
GGGCTCGCGCCGGACGGGCGGGCGAAGGCGTTCGGCGCGGGCGCGGACGG 1150 
GLA PDG R AKAFGAGADG 
60 TACGAGCTTCGCCGAGGGCGCCGGTGCCCTGGTGGTCGAGCGGCTCTCCG 1200 
TSFAEG A GALVVERLS 
ACGCGGAGCGCCACGGCCACACCGTCCTCGCCCTCGTACGCGGCTCCGCG 1250 
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DAE RH G H T VLALVRG SA 
GCTAACTCCGACGGCGCGTCGAACGGTCTGTCGGCGCCGAACGGCCCCTC 1300 

ANS DGASNGLSAPNGPS 
CCAGGAACGCGTCATCCACCAGGCCCTCGCGAACGCGAAACTCACCCCCG 1350 
5 QERVI HQALANAKLTP 

CCGATGTCGACGCGGTCGAGGCGCACGGCACCGGCACCCGCCTCGGCGAC 1400 
ADVDAVEAHGT GTRL GD 
CCCATCGAGGCGCAGGCGCTGCTCGCGACGTACGGACAGGACCGGGCGAC 1450 
PI EAQALLATYGQDRAT 
10 GCCCCTGCTGCTCGGCTCGCTGAAGTCGAACATCGGGCACGCCCAGGCCG 1500 
PLLLGSLKSNI GHAQA 
CGTCAGGGGTCGCCGGGATCATCAAGATGGTGCAGGCCATCCGGCACGGG 1550 
AS G VAG I I KMVQA I R H G 
GAACTGCCGCCGACACTGCACGCGGACGAGCCGTCGCCGCACGTCGACTG 1 600 
15 ELPPTLHADEPSPHVDW 

GACGGCCGGTGCCGTCGAGCTCCTGACGTCGGCCCGGCCGTGGCCGGGGA 1650 

TAGAVELLTSARPWP.G 
CCGGTCGCCCTAGGCGGGCGGGCGTGTCGTCCTTCGGAGTCAGCGGCACC 1700 
TGRPRRAGVSSFGVSGT 
20 AACGCCCACGTCATCCTGGAGAGCGCACCCCCCGCTCAGCCCGCGGAGGA 1750 
NAHV I LESAPPAQP AEE 
GGCGCAGCCTGTTGAGACGCCGGTGGTGGCCTCGGATGTGCTGCCGCTGG 1800 

AQ PVETPVVASDVLP L 
TGATATCGGCCAAGACCCAGCCCGCCCTGACCGAACACGAAGACCGGCTG 1850 
25 VI SAKTQPALTEHEDRL 

CGCGCCTACCTGGCGGCGTCGCCCGGGGCGGATATACGGGCTGTGGCATC 1 900 

RAYLAASPGADIRAVAS 
GACGCTGGCGGTGACACGGTCGGTGTTCGAGCACCGCGCCGTACTCCTTG 1950 
TLAVTRSVFEHRAVLL 
30 GAGATGACACCGTCACCGGCACCGCGGTGACCGACCCCAGGATCGTGTTT 2000 
GDDTVTGTAVTDPRIVF 
GTCTTTCCCGGGCAGGGGTGGCAGTGGCTGGGGATGGGCAGTGCACTGCG 2050 

VFPG Q GWQWL GMGSALR 
CGATTCGTCGGTGGTGTTCGCCGAGCGGATGGCCGAGTGTGCGGCGGCGT 2100 
35 DS S V V FAE R MAECAAA 

TGCGCGAGTTCGTGGACTGGGATCTGTTCACGGTTCTGGATGATCCGGCG 2150 
LRE FVDWDLFTVLDDPA 
GTGGTGGACCGGGTTGATGTGGTCCAGCCCGCTTCCTGGGCGATGATGGT 2200 
VVDRVDVVQPASWAMMV 
40 TTCCCTGGCCGCGGTGTGGCAGGCGGCCGGTGTGCGGCCGGATGCGGTGA 2250 
S LAAVWQAAGVRP DA V 
TCGGCCATTCGCAGGGTGAGATCGCCGCAGCTTGTGTGGCGGGTGCGGTG 2300 
I GH S QGE IAAACVAGA .V 
TCACTACGCGATGCCGCCCGGATCGTGACCTTGCGCAGCCAGGCGATCGC 2350 
45 SLRDAARIVTLRSQAIA 

CCGGGGCCTGGCGGGCCGGGGCGCGATGGCATCCGTCGCCCTGCCCGCGC 2400 

R G L A G R G A M A S V A L P A 
AGGATGTCGAGCTGGTCGACGGGGCCTGGATCGCCGCCCACAACGGGCCC 2450 
Q D V E L V D G A W I A A H N G P 
50 GCCTCCACCGTGATCGCGGGCACCCCGGAAGCGGTCGACCATGTCCTCAC 2500 
ASTVIAGTPEAVDHVLT 
CGCTCATGAGGCACAAGGGGTGCGGGTGCGGCGGATCACCGTCGACTATG 2550 

AHEAQGVRVRRI TV D Y 
CCTCGCACACCCCGCACGTCGAGCTGATCCGCGACGAACTACTCGACATC 2600 
55 ASHTPHVELIRDELLDI 

ACTAGCGACAGCAGCTCGCAGACCCCGCTCGTGCCGTGGCTGTCGACCGT 2650 

T S D S S S Q T P L V P W L S T v .- 
GGACGGCACCTGGGTCGACAGCCCGCTGGACGGGGAGTACTGGTACCGGA 2700 
DGTWVDSPLDGEYWY R 
60 ACCTGCGTGAACCGGTCGGTTTCCACCCCGCCGTCAGCCAGTTGCAGGCC 2750 
NLRE P VGFH PAVSQLQA 
CAGGGCGACACCGTGTTCGTCGAGGTCAGCGCCAGCCCGGTGTTGTTGCA 2800 

QGD TVFVE V SASPVL LQ 
GGCGATGGACGACGATGTCGTCACGGTTGCCACGCTGCGTCGTGACGACG 2850 
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AMDDDVVTVATLRRDD 
GCGACGCCACCCGGATGCTCACCGCCCTGGCACAGGCCTATGTCCACGGC 2900 
GDATRMLTALAQAYVHG 
GTCACCGTCGACTGGCCCGCCATCCTCGGCACCACCACAACCCGGGTACT 2950 
5 VTVDWPAILGTTTTRVL 

GGACCTTCCGACCTACGCCTTCCAACACCAGCGGTACTGGCTCGAGTCGG 3000 

DLPTYAF.QH QRYWLES 
CTCCCCCGGCCACGGCCGACTCGGGCCACCCCGTCCTCGGCACCGGAGTC 3050 
A P P A T A D S G H P V L G T G V 
1 0 GCCGTCGCCGGGTCGCCGGGCCGGGTGTTCACGGGTCCCGTGCCCGCCGG 3100 
AVAG S PGRVF T GPVPAG 
TGCGGACCGCGCGGTGTTCATCGCCGAACTGGCGCTCGCCGCCGCCGACG 3150 

ADRAVFIAELALAAAD 
CCACCGACTGCGCCACGGTCGAACAGCTCGACGTCACCTCCGTGCCCGGC 3200 
15 ATDCATV EQLDVTSVPG 

GGATCCGCCCGCGGCAGGGCCACCGCGCAGACCTGGGTCGATGAACCCGC 3250 

GSARG RATAQTWVDEPA 
CGCCGACGGGCGGCGCCGCTTCACCGTCCACACCCGCGTCGGCGACGCCC 3300 
ADGRRRFTVHTRVGDA 
20 CGTGGACGCTGCACGCCGAGGGGGTTCTCCGCCCCGGCCGCGTGCCCCAG 3350 
PWTLHAEGVLRPGRVPQ 
CCCGAAGCCGTCGACACCGCCTGGCCCCCGCCGGGCGCGGTGCCCGCGGA 34 00 

PEAVDTAWPPP GAVPA D 
CGGGCTGCCCGGGGCGTGGCGACGCGCGGACCAGGTCTTCGTCGAAGCCG 3450 
25 GLPGAWRRADQVFVEA 

AAGTCGACAGCCCTGACGGCTTCGTGGC ACACCCCGACCTGCTCGACGCG 3500 
EV DS P D G FVAH P DL LDA 
GTCTTCTCCGCGGTCGGCGACGGGAGCCGCCAGCCGACCGGATGGCGCGA 3550 
VFSAVGDGSRQPTGWRD 
30 CCTCGCGGTGCACGCGTCGGACGCCACCGTGCTGCGCGCCTGCCTCACCC 3600 
LAV HA S D ATVLRACLT 
GCCGCGACAGTGGTGTCGTGGAGCTCGCCGCCTTCGACGGTGCCGGAATG 3650 
RRDS GVVELAAFDGAGM 
CCGGTGCTCACCGCGGAGTCGGTGACGCTGGGCGAGGTCGCGTCGGCAGG 37 00 
35 PVLTAESVTLGEVASAG 

CGGATCCGACGAGTCGGACGGTCTGCTTCGGCTTGAGTGGTTGCCGGTGG 3750 

GS DES DGLLRLEWLPV 
CGGAGGCCCACTACGACGGTGCCGACGAGCTGCCCGAGGGCTACACCCTC 3800 
AEAHY DGADELPEG YTL 
40 ATCACCGCCACACACCCCGACGACCCCGACGACCCCACCAACCCCCACAA 38 50 
ITATHPDDPDDPTNPHN 
C AC ACCCAC ACGCACCCAC AC ACAAACCACACGCGTCCTCACCGCCCTCC 3900 

TPTRTHTQTTRVLTAL 
AACACCACCTCATCACCACCAACCACACCCTCATCGTCCACACCACCACC 3950 
45 QHHLI TTNHTLIVHTTT 

G ACCCCCC AGGCGCCGCCGTC ACCGGCCTC ACCCGCACCGC AC AAAACGA 4000 

D P PGAAVTGL T RTAQN E 
ACACCCCGGCCGC ATCCACCTCATCG AAACCC ACC ACCCCC ACACCCC AC 4 050 
H P G R I H L I E T H H P H T P 
50 TCCCCCTCACCCAACTCACCACCCTCCACCAACCCCACCTACGCCTCACC 4100 
LPLTQLTTLHQPHLRLT 
AACAACACCCTCCACACCCCCCACCTCACCCCCATCACCACCC ACCACAA 4150 

N N T L H T P H L T P I T T H H N 
CACC ACCACAACCACCCCCAACACCCCACCCCTCAACCCCAACCACGCCA 4 2 00 
55 TTTTTPNT PP LNP NHA 

TCCTCATCACCGGCGGCTCCGGCACCCTCGCCGGCATCCTCGCCCGCCAC 4250 
I L I TG G S GTL AG I LARH 
CTCAACCACCCCC ACACCT ACCTCCTCTCCCGCACACCACCACCCCCC AC 4 300 
LNHPHTYLLSRTPPPPT 
60 C ACACCCGGCACCCACATCCCCTGCGACCTCACCGACCCC ACCCAAATCA 4 350 
TPGTHIPCDLTDPTQI 
CCCAAGCCCTCACCCACATACCACAACCCCTCACCGGCATCTTCCACACC 4 400 
TQALTHI PQPLTGI F HT 
GCCGCCACCCTCGACGACGCCACCCTCACCAACCTCACCCCCCAACACCT 4450 

98 



3NSDOCID: <WO_0020601 A2_l_> 



WO 00/20601 



PCT/US99/22886 



AATLDDATLTNLTPQHL 
CACCACCACCCTCCAACCCAAAGCCGACGCCGCCTGGCACCTCCACCACC 4500 

TTTLQP KADAAWHLHH 
ACACCCAAAACCAACCCCTCACCCACTTCGTCCTCTACTCCAGCGCCGCC 4 550 
HTQNQ PLTHFVLYSSAA 
GCCACCCTCGGCAGCCCCGGCCAAGCCAACTACGCCGCCGCCAACGCCTT 4 600 

ATLGS PGQANYA AANAF 
CCTCGACGCCCTCGCCACCCACCGCCACACCCAAGGACAACCCGCCACCA 4 600 

LDALAT H RHTQGQPA-T 
CCATCGCCTGGGGCATGTGGCACACCACCACCACACTCACCAGCCAACTC 4 700 
TIAWGMWHTTTTLTSQL 
ACCGACAGCGACCGCGACCGCATCCGCCGCGGCGGCTTCCTGCCGATCTC 4750 

TDSDRDRIRRGGFLPIS 
GGACGACGAGGGCATGC 

D D E G M 

The Nhel-Xhol hybrid FK-506 PKS module 8 containing the AT domain of 
module 12 of rapamycin is shown below. 

GCATGCGGCTGTACGAGGCGGCACGGCGCACCGGAAGTCCCGTGGTGGTG 50 

M R LY EAARRTG S P VV V 
GCGGCCGCGCTCGACGACGCGCCGGACGTGCCGCTGCTGCGCGGGCTGCG 100 

AAALD DAPDVPLLRGL R 
GCGTACGACCGTCCGGCGTGCCGCCGTCCGGGAACGCTCTCTCGCCGACC 150 

RT TVRRAAVRERS LAD 
GCTCGCCGTGCTGCCCGACGACGAGCGCGCCGACGCCTCCCTCGCGTTCG 200 
RSPCCPTTSAPTPPSRS 
TCCTGGAACAGCACCGCCACCGTGCTCGGCCACCTGGGCGCCGAAGACAT 250 

SWNSTA TVLGHLGAEDI 
CCCGGCGACGACGACGTTCAAGGAACTCGGCATCGACTCGCTCACCGCGG 300 

PATTTFKELGI DS LTA 
TCCAGCTGCGCAACGCGCTGACCACGGCGACCGGCGTACGCCTCAACGCC 350 

V Q L P. N A L T T A T G V R L N A 
ACAGCGGTCTTCGACTTTCCGACGCCGCGCGCGCTCGCCGCGAGACTCGG 400 

TAV FD FPT PRALAARL G 
CGACGAGCTGGCCGGTACCCGCGCGCCCGTCGCGGCCCGGACCGCGGCCA 4 50 

D E L AG T RA PVAART AA 
CCGCGGCCGCGCACGACGAACCGCTGGCGATCGTGGGCATGGCCTGCCGT 500 
TAAAH DE PLAIVGMAC R 
CTGCCGGGCGGGGTCGCGTCGCCACAGGAGCTGTGGCGTCTCGTCGCGTC 550 

LPGGVAS PQELWR LVAS 
CGGCACCGACGCCATCACGGAGTTCCCCGCGGACCGCGGCTGGGACGTGG 600 - 

G T DA I T E F PAD RGW D V. 
ACGCGCTCTACGACCCGGACCCCGACGCGATCGGCAAGACCTTCGTCCGG 650 
DA L Y D P D P DA I G K T FV R 
CACGGCGGCTTCCTCGACGGTGCGACCGGCTTCGACGCGGCGTTCTTCGG 700 

H G G F L D G A T G F D A A F F G 
GATCAGCCCGCGCGAGGCCCTGGCCATGGACCCGCAGCAACGGGTGCTCC 7 50 

I S PRE A LAMDPQQRV L 
TGGAGACGTCCTGGGAGGCGTTCGAAAGCGCGGGCATCACCCCGGACGCG 800 
L E T S W E A F E S A G I T. P D A 
GCGCGGGGCAGCGACACCGGCGTGTTCATCGGCGCGTTCTCCTACGGGTA 850 

ARGSDTGVFIGAFSYGY 
CGGCACGGGTGCGGATACCAACGGCTTCGGCGCGACAGGGTCGCAGACCA 900 

GTGADT NGFGATGSQ T 
GCGTGCTCTCCGGCCGCCTCTCGTACTTCTACGGTCTGGAGGGCCCTTCG 950 
SVLSGRL.S YFYG.LEGP.S 
GTCACGGTCGACACCGCCTGCTCGTCGTCACTGGTCGCCCTGCACCAGGC 1000 

VTVDTA CSSSLVALHQA 
AGGGCAGTCCCTGCGCTCGGGCGAATGCTCGCTCGCCCTGGTCGGCGGTG 1050 

GQSLRSGECSLALVG G 
TCACGGTGATGGCGTCGCCCGGCGGATTCGTCGAGTTCTCCCGGCAGCGC 1100 

V T V MAS PGGFVEFSRQR 
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GGGCTCGCGCCGGACGGGCGGGCGAAGGCGTTCGGCGCGGGCGCGGACGG 1150 

G L A P D G R A K A F G A G A D G 
TACGAGCTTCGCCGAGGGCGCCGGTGCCCTGGTGGTCGAGCGGCTCTCCG 1200 
TS FAEGAGALVVERLS 
5 ACGCGGAGCGCCACGGCCACACCGTCCTCGCCCTCGTACGCGGCTCCGCG 1250 
DAERHGHTVLALVRGSA 
GCTAACTCCGACGGCGCGTCGAACGGTCTGTCGGCGCCGAACGGCCCCTC 1300 

ANS DGASNG L. SAPNGPS 
CCAGG AACGCGTCATCCACCAGGCCCTCGCGAACGCGAAACTCACCCCCG 1350 
10 QE RVI H Q.ALAN AKL TP 

CCGATGTCGACGCGGTCGAGGCGCACGGCACCGGCACCCGCCTCGGCGAC 14 00 
ADVDAVEAHGTGTRLGD 
CCCATCGAGGCGCAGGCGCTGCTCGCGACGTACGGACAGGACCGGGCGAC 1450 
P I EAQALLATYG.QDRAT 
1 5 GCCCCTGCTGC TCGGCTCGCTG AAGTCGAACATCGGGCACGCCCAGGCCG 1500 
PLLLGS LK.SN I GHAQA 
CGTCAGGGGTCGCCGGGATCATCAAGATGGTGCAGGCCATCCGGCACGGG 1550 
ASGVAGI IKMVQAI RHG 
GAACTGCCGCCGACACTGCACGCGGACGAGCCGTCGCCGCACGTCGACTG 1600 
20 E Tj P PT L H ADE P S P H VD W 

GACGGCCGGTGCCGTCGAGCTCCTGACGTCGGCCCGGCCGTGGCCGGGGA 1650 

TAGAV EL.LT SARPWPG 
CCGGTCGCCCGCGCCGCGCTGCCGTCTCGTCGTTCGGCGTGAGCGGCACG 1700 
TGRPRRAAVSSF GVSGT 
25 AACGCCCACATCATCCTTGAGGCAGGACCGGTCAAAACGGGACCGGTCGA 1750 
NAH I I LE-AGPVKTG PVE 
GGCAGGAGCGATCGAGGCAGGACCGGTCGAAGTAGGACCGGTCGAGGCTG 1800 

AGAIEAG. PVEVGPVEA 
GACCGCTCCCCGCGGCGCCGCCGTCAGCACCGGGCGAAGACCTTCCGCTG 1850 
30 GPLPAAPPSA PGEDLPL 

CTCGTGTCGGCGCGTTCCCCGGAGGCACTCGACGAGCAGATCGGGCGCCT 1900 

L VSARSPEALDEQI GRL 
GCGCGCCTATCTCGACACCGGCCCGGGCGTCGACCGGGCGGCCGTGGCGC 1950 
RAYLDTGPGVDRAAVA 
35 AGACACTGGCCCGGCGTACGCACTTCACCCACCGGGCCGTACTGCTCGGG 2000 
QTLARRTH FTHRAVLLG 
GACACCGTCATCGGCGCTCCCCCCGCGGACCAGGCCGACGAACTCGTCTT 2050 

DTVI GAP PADQADELVF 
CGTCTACTCCGGTCAGGGCACCCAGCATCCCGCGATGGGCGAGCAGCTAG 2100 
40 VYSGQ GTQH PAMGEQL 

CCGCCGCGTTCCCCGTCTTCGCGCGGATCCATCAGCAGGTGTGGGACCTG 2150 
AAAFPV F.ARIHQQVW DL 
CTCGATGTGCCCGATCTGGAGGTGAACGAGACCGGTTACGCCCAGCCGGC 2200 
LDVPDLEVN.ETGYAQPA 
45 CCTGTTCGCAATGCAGGTGGCTCTGTTCGGGCTGCTGGAATCGTGGGGTG 2250 
LFAMQVA LFGLLESWG 
TACGACCGGACGCGGTGATCGGCCATTCGGTGGGTGAGCTTGCGGCTGCG 2300 
V R P D A V I G H SVG E L A A A 
TATGTGTCCGGGGTGTGGTCGTTGGAGGATGCCTGCACTTTGGTGTCGGC 2350 
50 YVSGVWS.LEDACTLVSA 

GCGGGCTCGTCTGATGCAGGCTCTGCCCGCGGGTGGGGTGATGGTCGCTG 24 00 

RARLMQALPAGGVMVA 
TCCCGGTCTCGGAGGATGAGGCCCGGGCCGTGCTGGGTGAGGGTGTGGAG 24 50 
VPVSE DEARAVLGEGVE 
55 ATCGCCGCGGTCAACGGCCCGTCGTCGGTGGTTCTCTCCGGTGATGAGGC 2500 
IAAVNGP.SSVVLSGDEA 
CGCCGTGCTGCAGGCCGCGGAGGGGCTGGGGAAGTGGACGCGGCTGGCGA 2550 

AVLQAAE. GLGKWTRLA 
CCAGCCACGCGTTCCATTCCGCCCGTATGGAACCCATGCTGGAGGAGTTC 2600 
60 TSHAFHSA RMEPMLEEF 

CGGGCGGTCGCCGAAGGCCTGACCTACCGGACGCCGCAGGTCTCCATGGC 2650 

RAV-AEGLTYRTPQVSMA 
CGTTGGTGATCAGGTGACCACCGCTGAGTACTGGGTGCGGCAGGTCCGGG 2700 
VGDQVTTAEYWVRQVR 
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ACACGGTCCGGTTCGGCGAGCAGGTGGCCTCGTACGAGGACGCCGTGTTC 2750 
DTVRFGEQVASYEDAVF 
GTCGAGCTGGGTGCCGACCGGTCACTGGCCCGCCTGGTCGACGGTGTCGC 2800 
VE L G A D R S LARLVOGVA 
5 GATGCTGCACGGCGACCACGAAATCCAGGCCGCGATCGGCGCCCTGGCCC 2850 
ML H G D H E I QAAI GAL A 
ACCTGTATGTCAACGGCGTCACGGTCGACTGGCCCGCGCTCCTGGGCGAT 2900 
HLYVNGVTVDWP A LL GD 
GCTCCGGCAACACGGGTGCTGGAGCTTCCGACATACGCCTTCCAGCACCA 2950 
10 APATRVLDLPTYAFQHQ 

GCGCTACTGGCTCGAGTCGGCTCCCCCGGCCACGGCCGACTCGGGCCACC 3000 

RYWLESAPP ATADSG H 
CCGTCCTCGGCACCGGAGTCGCCGTCGCCGGGTCGCCGGGCCGGGTGTTC 3050 
PVLGTGVAVAGS PGRVF 
15 ACGGGTCCCGTGCCCGCCGGTGCGGACCGCGCGGTGTTCATCGCCGAACT 3100 
TGPVPAGADRAVFIAEL 
GGCGCTCGCCGCCGCCGACGCCACCGACTGCGCCACGGTCGAACAGCTCG 31 5 0 

ALAAADAT DCA T V EQL 
ACGTCACCTCCGTGCCCGGCGGATCCGCCCGCGGCAGGGCCACCGCGCAG 3200 
20 DVTSVPGGSARGRATA Q 

ACCTGGGTCGATGAACCCGCCGCCGACGGGCGGCGCCGCTTCACCGTCCA 3250 

TWVDE PAA D GRRR FTVH 
CACCCGCGTCGGCGACGCCCCGTGGACGCTGCACGCCGAGGGGGTTCTCC 3300 
TRVG D A PWTLHAEGV L 
25 GCCCCGGCCGCGTGCCCCAGCCCGAAGCCGTCGACACCGCCTGGCCCCCG 3350 
RPGRVPQPEAVDTAWP P 
CCGGGCGCGGTGCCCGCGGACGGGCTGCCCGGGGCGTGGCGACGCGCGGA 3400 

PGAVPADGLPGAWRRAD 
CCAGGTCTTCGTCGAAGCCGAAGTCGACAGCCCTGACGGCTTCGTGGCAC 3450 
30 QVFVEAEVDS PDGFVA 

ACCCCGACCTGCTCGACGCGGTCTTCTCCGCGGTCGGCGACGGGAGCCGC 3500 
HPDLL D AV FSAVGD GSR 
CAGCCGACCGGATGGCGCGACCTCGCGGTGCACGCGTCGGACGCCACCGT 3550 
Q P T'G W R D LAV H AS DAT V 
35 GCTGCGCGCCTGCCTCACCCGCCGCGACAGTGGTGTCGTGGAGCTCGCCG 3600 
LRACLTRRDSGVVEL A 
CCTTCGACGGTGCCGGAATGCCGGTGCTCACCGCGGAGTCGGTGACGCTG 3650 
AFDGAGMPVLTAESVTL 
GGCGAGGTCGCGTCGGCAGGCGGATCCGACGAGTCGGACGGTCTGCTTCG 37 00 
40 GEVASAGGS DESDGLLR 

GCTTGAGTGGTTGCCGGTGGCGGAGGCCCACTACGACGGTGCCGACGAGC 3750 

LEWLPV AEAH.YDG A D E 
TGCCCGAGGGCTACACCCTCATCACCGCCACACACCCCGACGACCCCGAC 3800 
LPEGYTLITATHP DD P D 
45 GACCCCACCAACCCCCACAACACACCCACACGCACCCACACACAAAQCAC 3850 
D P T N P H N T P T R T H T Q T T 
ACGCGTCCTCACCGCCCTCCAACACCACCTCATCACCACCAACCACACCC 3900 

R V L T A L Q H H L I T T N H T 
TCATCGTCCACACCACCACCGACCCCCCAGGCGCCGCCGTCACCGGCCTC 3950 
50 L I V H T T T D P P GAA V TG L 

ACCCGCACCGCACAAAACGAACACCCCGGCCGCATCCACCTCATCGAAAC 4000 

TRTAQ NEHP GRI H L I E T 
CCACCACCCCCACACCCCACTCCCCCTCACCCAAGTCACCACCCTCCACC 4050 
H H P H T P L P L .T Q L T T " L H 
55 AACCCCACCTACGCCTCACCAACAACACCCTCCACACCCCCCACCTCACC 4100 
QPHLRLTNNT LHTP HL T 
CCCATCACCACCCACCACAACACCACCACAACCACCCCCAACACCCCACC 4150 

P I T T H H N" T T T. T T P N T P P 
CCTCAACCCCAACCACGCCATCCTCATCACCGGCGGCTCCGGCACCCTCG 4200 
60 LNPNH AI LIT GGSG T L 

CCGGCATCCTCGCCCGCCACCTCAACCACCCCCACACCTACCTCCTCTCC 4250 
AGIJjARHLNHPHTYLLS 
CGCACACCACCACCCCCCACCACACCCGGCACCCACATCCCCTGCGACCT .4300 
RTPPPPTTPGTHIP CDL 
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CACCGACCCCACCCAAATCACCCAAGCCCTCACCCACATACCACAACCCC 4350 

TDPTQITQ'ALTHIPQP 
TCACCGGCATCTTCCACACCGCCGCCACCCTCGACGACGCCACCCTCACC 44 00 
LTG I FHTAATLDDATLT 
5 AACCTCACCCCCCAACACCTCACCACCACCCTCCAACCCAAAGCCGACGC 4 450 
NLT PQHLTTTLQPKADA 
CGCCTGGCACCTCCACCACCACACCCAAAACCAACCCCTCACCCACTTCG 4500 

AWHLHHHT Q NQPLTHF 
TCCTCTACTCCAGCGCCGCCGCCACCCTCGGCAGCCCCGGCCAAGCCAAC 4 550 
10 V L Y S S AAAT LGS PG QAN 

TACGCCGCCGCCAACGCCTTCCTCGACGCCCTCGCCACCCACCGCCACAC 4 600 

YAAANAFLDALA THRHT 
CCAAGGACAACCCGCCACCACCATCGCCTGGGGCATGTGGCACACCACCA 4 650 
QGQPATTIAWGM WHTT 
15 CCACACTCACCAGCCAACTCACCGACAGCGACCGCGACCGCATCCGCCGC 4 700 
TTLTSQL TDSD RDRIRR 
GGCGGCTTCCTGCCGATCTCGGACGACGAGGGCATGC 
GGFL PISDDEGM 

20 The Nhel-Xhol hybrid FK-506 PKS module 8 containing the AT domain of 

module 13 of rapamycin is shown below. 

GCATGCGGCTGTACGAGGCGGGACGGCGCACCGGAAGTCCCGTGGTGGTG 50 

MRL YEAARRTGS P V VV 
GCGGCCGCGCTCGACGACGCGCCGGACGTGCCGCTGCTGCGCGGGCTGCG 100 
25 AAAL DDAPDVPLLRGLR 

GCGTACGACCGTCCGGCGTGCCGCCGTCCGGGAACGCTCTCTCGCCGACC 150 

RTTVRRAAVRERSLAD 
GCTCGCCGTGCTGCCCGACGACGAGCGCGCCGACGCCTCCCTCGCGTTCG 200 
RS PCCPTTSA PTPPSRS 
30 TCCTGGAACAGCACCGCCACCGTGCTCGGCCACCTGGGCGCCGAAGACAT 250 
S W N S T AT VLG H LGA E D I 
CCCGGCGACGACGACGTTCAAGGAACTCGGCATCGACTCGCTCACCGCGG 300 

PATTT F K ELGI DS LTA 
TCCAGCTGCGCAACGCGCTGACCACGGCGACCGGCGTACGCCTCAACGCC 350 
35 VQLRNALTTATGVRL.NA 

ACAGCGGTCTTCGACTTTCCGACGCCGCGCGCGCTCGCCGCGAGACTCGG 400 

TAVFDFPTPRALAARLG 
CGACGAGCTGGCCGGTACCCGCGCGCCCGTCGCGGCCCGGACCGCGGCCA 450 
DELAGTRAPVAARTAA 
40 CCGCGGCCGCGCACGACGAACCGCTGGCGATCGTGGGCATGGCCTGCCGT 500 
TAAAHDE PLAIVGMA CR 
CTGCCGGGCGGGGTCGCGTCGCCACAGGAGCTGTGGCGTCTCGTCGCGTC 550 

L PGGVAS PQELWR LVAS 
CGGCACCGACGCCATCACGGAGTTCCCCGCGGACCGCGGCTGGGACGTGG 600 
45 GTDAITE FPADR GWDV 

ACGCGCTCTACGACCCGGACCCCGACGCGATCGGCAAGACCTTCGTCCGG 650 
DA L Y D P D P DA I G K T F V R 
. CACGGCGGCTTCCTCGACGGTGCGACCGGCTTCGACGCGGCGTTCTTCGG 700 
HGG FLDGATGFDAAFFG 
50 GATCAGCCCGCGCGAGGCCCTGGCCATGGACCCGCAGCAACGGGTGCTCC 750 
I SPREALAMDPQQRVL 
TGGAGACGTCCTGGGAGGCGTTCGAAAGCGCGGGCATCACCCCGGACGCG 800 
LETSWEAFES AGITPDA 
GCGCGGGGCAGCGACACCGGCGTGTTCATCGGCGCGTTCTCCTACGGGTA 850 - - ' 

55 ARGSDTGVFIG AFSYGY 

CGGCACGGGTGCGGATACCAACGGCTTCGGCGCGACAGGGTCGCAGACCA 900 

GTGADTNGFGATGSQT 
GCGTGCTCTCCGGCCGCCTCTCGTACTTCTACGGTCTGGAGGGCCCTTCG 950 
SVL SG RLSYFYGLEGPS 
60 GTCACGGTCGACACCGCCTGCTCGTCGTCACTGGTCGCCCTGCACCAGGC 1000 
VTV-DTACSSSLVAL HQA 
AGGGCAGTCCCTGCGCTCGGGCGAATGCTCGCTCGCCCTGGTCGGCGGTG 1050 
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GQSLRSGECSLAL .VG G 
TCACGGTGATGGCGTCGCCCGGCGGATTCGTCGAGTTCTCCCGGCAGCGC 1 100 
VTVMAS PGGFVEF's'RQR 
GGGCTCGCGCCGGACGGGCGGGCGAAGGCGTTCGGCGCGGGCGCGGACGG 1150 

GLAF DG RAKAFGAG ADG 
TACGAGCTTCGCCGAGGGCGCCGGTGCCCTGGTGGTCGAGCGGCTCTCCG 1200 

TSFAE GAGALV.VER LS 
ACGCGGAGCGCCACGGCCACACCGTCCTCGCCCTCGTACGCGGCTCCGCG 1250 
DAERHGHTVLALVR GSA 
GCTAACTCCGACGGCGCGTCGAACGGTCTGTCGGCGCCGAACGGCCCCTC 1300 

AN S DGA S N G L S A P N G P S 
CCAGGAACGCGTCATCCACCAGGCCCTCGCGAACGCGAAACTCACCCCCG 1350 

QERV I HQALANAKL- TP- 
CCGATGTCGACGCGGTCGAGGCGCACGGCACCGGCACCCGCCTCGGCGAC 1400 
ADVDAVE.AHGTG T RLGD 
CCCATCGAGGCGCAGGCGCTGCTCGCGACGTACGGACAGGACCGGGCGAC 14 50 

P I EAQA LLATYGQD RAT 
GCCCCTGCTGCTCGGCTCGCTGAAGTCGAACATCGGGCACGCCCAGGCCG 1500 

PLLLGSLKSN IGHAQA 
CGTCAGGGGTCGCCGGGATCATCAAGATGGTGCAGGCCATCCGGCACGGG 1550 
ASGVAGI I KMVQAI RHG 
GAACTGCCGCCGACACTGCACGCGGACGAGCCGTCGCCGCACGTCGACTG 1600 

ELPPTLHADE'PSPHVDW 
GACGGCCGGTGCCGTCGAGCTCCTGACGTCGGCCCGGCCGTGGCCGGGGA 1650 

TAGAVELLTSARPWPG 
CCGGTCGCCCGCGCCGCGCTGCCGTCTCGTCGTTCGGCGTGAGCGGCACG 1700 
TGRPRRAAVSSFGVSGT 
AACGCCCACATCATCCTTGAGGCAGGACCGGTCAAAACGGGACCGGTCGA 1750 

NAH I I LEA G PVK TG PVE 
GGCAGGAGCGATCGAGGCAGGACCGGTCGAAGTAGGACCGGTCGAGGCTG 1800 

AGA I E A G PVEVG PVEA 
GACCGCTCCCCGCGGCGCCGCCGTCAGCACCGGGCGAAGACCTTCCGCTG 1850 
G P L PAAP PSAPG E D LPL 
CTCGTGTCGGCGCGTTCCCCGGAGGCACTCGACGAGCAGATCGGGCGCCT 1900 

LVSARS PEALDEQIGRL 
GCGCGCCTATCTCGACACCGGCCCGGGCGTCGACCGGGCGGCCGTGGCGC 1950 

RAY L DTG PGV DRAAVA 
AGACACTGGCCCGGCGTACGCACTTCACCCACCGGGCCGTACTGCTCGGG 2000 
QTLARRTHFTHRAVL LG 
GACACCGTCATCGGCGCTCCCCCCGCGGACCAGGCCGACGAACTCGTCTT 2050 

DTVIGAPPADQADELVF 
CGTCXACTCCGGTCAGGGCACCCAGCATCCCGCGATGGGCGAGCAGCTAG 2100 

VYSGQGTQHPAMGEQL 
CCGATTCGTCGGTGGTGTTCGCCGAGCGGATGGCCGAGTGTGCGGCGGCG 2150 
ADSSVVFAERMAECA AA 
TTGCGCGAGTTCGTGGACTGGGATCTGTTCACGGTTCTGGATGATCCGGC 2200 

LREFVDWDLFTVLDD PA 
GGTGGTGGACCGGGTTGATGTGGTCCAGCCCGCTTCCTGGGCGATGATGG 2250 

V V D R V D V V Q P A S W A M M 
TTTCCCTGGCCGCGGTGTGGCAGGCGGCCGGTGTGCGGCCGGATGCGGTG 2300 
VSLAAVWQAAGVRPDAV 
ATCGGCCATTCGCAGGGTGAGATCGCCGCAGCTTGTGTGGCGGGTGCGGT 2350 

IGHSQGEIAAACVAGAV 
GTCACTACGCGATGCCGCCCGGATCGTGACCTTGCGCAGCCAGGCGATCG 2400 

S LRDAA RI V T LR S QAI 
CCCGGGGCCTGGCGGGCCGGGGCGCGATGGCATCCGTCGCCCTGCCCGCG 2450 
AR GLAG RGAMASVALPA 
CAGGATGTCGAGCTGGTCGACGGGGCCTGGATCGCCGCCCACAACGGGCC 2500 

QDVELVDGAWIAAHNGP 
CGCCTCCACCGTGATCGCGGGCACCCCGGAAGCGGTCGACCATGTCCTCA 2550 

A S T V I A G T P - E A V D H V L 
CCGCTCATGAGGCACAAGGGGTGCGGGTGCGGCGGATCACCGTCGACTAT 2600 
TAHEAQGVRVRRI TVDY 
GCCTCGCACACCCCGCACGTCGAGCTGATCCGCGACGAACTACTCGACAT 2650 
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ASHTPHVEL IR DELLDI 
CACTAGCGACAGCAGGTOGOAGACCCGGCTCGTGCCGTGGCTGTCGACCG 2700 

TSDSSSQTP LVPWLS T 
TGGACGGCACCTGGGTCGACAGCCCGCTGGACGGGGAGTACTGGTACCGG 2750 
VDGTWVDSPLDGEY WYR 
AACCTGCGTGAACCGGTCGGTTTCCACCCCGCCGTCAGCCAGTTGCAGGC 2800 

NLRE PV G FH PAVSQ LQA 
CCAGGGCGAC ACCGTGTTCGTCGAGGTC AGCGCCAGCCCGGTGTTGTTGC 2850 

QG DTV FV EVSAS PVL L 
AGGCGATGGACGACGATGTCGTCACGGTTGCCACGCTGCGTCGTGACGAC 2900 
Q AM D D D V V T VAT LRR D D 
GGCGACGCCACCCGGATGCTCACCGCCCTGGCACAGGCCTATGTCCACGG 2950 

G DATRM LTALAQAYVHG 
CGTCACCGTCGACTGGCCCGCCATCCTCGGCACCACC ACAACCCGGGTAC 3000 

VTVDWPAILGTTTTRV 
TGGACCTTCCGACCTACGCCTTCCAACACCAGCGGTACTGGCTCGAGTCG 3050 
LDLPTYAFQHQRYWLES 
GCTCCCCCGGCCACGGCCGACTCGGGCCACCCCGTCCTCGGCACCGGAGT 3100 

APPATADSGH PVL GTGV 
CGCCGTCGCCGGGTCGCCGGGCCGGGTGTTCACGGGTCCCGTGCCCGCCG 3150 

AVAGSPGR VFTGPVPA 
GTGCGGACCGCGCGGTGTTCATCGCCGAACTGGCGCTCGCCGCCGCCGAC 3200 
GADRAVFIAELALAAAD 
GCCACCGACTGCGCCACGGTCGAACAGCTCGACGTCACCTCCGTGCCCGG 3250 

ATDCATVEQLDVTSVPG 
CGGATCCGCCCGCGGCAGGGCCACCGCGCAGACCTGGGTCGATGAACCCG 3300 

GSARGRATAQTWV DE P 
CCGCCGACGGGCGGCGCCGCTTCACCGTCCACACCCGCGTCGGCGACGCC 3350 
AADGRRRFTVHT RVGDA 
CCGTGGACGCTGCACGCCGAGGGGGTTCTCCGCCCCGGCCGCGTGCCCCA 3400 

PWTLHAEGVLRPGRV PQ 
GCCCGAAGCCGTCGACACCGCCTGGCCCCCGCCGGGCGCGGTGCCCGCGG 3450 

PEAVDTAWPPPGAVPA 
ACGGGCTGCCCGGGGCGTGGCGACGCGCGGACCAGGTCTTCGTCGAAGCC 3500 
DGL PGAWRRA DQV FVEA 
GAAGTCGACAGCCCTGACGGCTTCGTGGCACACCCCGACCTGCTCGACGC 3550 

EVDS PDGFVAHPDLLDA 
GGTCTTCTCCGCGGTCGGCGACGGGAGCCGCCAGCCGACCGGATGGCGCG 3600 

VFSAVGDGSRQPTGW R 
ACCTCGCGGTGCACGCGTCGGACGCCACCGTGCTGCGCGCCTGCCTCACC 3650 
DLAVHASDATVLRACLT 
CGCCGCGACAGTGGTGTCGTGGAGCTCGCCGCCTTCG ACGGTGCCGGAAT 3700 

R R D S G V V E L AA F D GAG M 
GCCGGTGCTCACCGCGGAGTCGGTGACGCTGGGCGAGGTCGCGTCGGCAG 3750 

PVLTAE SVTLGEV ASA 
GCGGATCCGACGAGTCGGACGGTCTGCTTCGGCTTGAGTGGTTGCCGGTG 3800 
GGSDES DGLLRLEWLPV 
GCGGAGGCCCACTACGACGGTGCCGACGAGCTGCCCGAGGGCTACACCCT 3850 

A E A H Y D G A D E L P E G Y T L 
C ATCACCGCC ACACACCCCGACGACCCCG ACGACCCCACCAACCCCC ACA 3900 

ITATHP DDPDDPTNPH 
ACACACCCACACGCACCCACACACAAACCACACGCGTCCTCACCGCCCTC 3950 
NT PTRT H TQTT RVLTAL 
CAACACCACCTCATCACCACCAACCACACCCTCATCGTCCACACCACCAC 4000 

QHHLITTNHTLIVHTTT 
CGACCCCCCAGGCGCCGCCGTCACCGGCCTCACCCGCACCGCACAAAACG 4050 

DPPGAAVTGLTRTAQN 
AACACCCCGGCCGCATCCACCTCATCGAAACCCACCACCCCCACACCCCA 4100 
EHPGRI HLIETHHPHTP 
CTCCCCCTCACCCAACTCACCACCCTCCACCAACCCCACCTACGCCTCAC 4150 

LPLTQLTTL HQPH LRLT 
CAACAACACCCTCCACACCCCCCACCTCACCCCCATCACCACCCACCACA 4200 

NNTLH T PHLT P I TT H .H 
ACACCACC AC AACCACCCCCAACACCCCACCCCTCAACCCCAACCACGCC 4250 
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NTTTTTPNTPPLNPNHA 
ATCCTCATCACCGGCGGCTCCGGCACCCTCGCCGGCATCCTCGCCCGCCA 4300 

ILITGGSGTLAGILARH 
CCTCAACCACCCCCACACCTACCTCCTCTGCCGCACACCACCACCCCCCA 4350 
5 LNHPHTYLLSRTPPPP 

CCACACCCGGCACCCACATCCCCTGCGACCTCACCGACCCCACCCAAATC 4 4 00 
TTPGTHIPCD L'TDPTQI 
ACCCAAGCCCTCACCCACATACCACAACCCCTCACCGGCATCTTCCACAC 4 4 50 
T Q A L T H I P Q P L T G I F H T 
10 CGCCGCCACCCTCGACGACGCCACCCTCACCAACCTCACCCCCCAACACC 4500 
A A T L D-D A T L TN L T P Q H 
TCACCACCACCCTCCAACCCAAAGCCGACGCCGCCTGGCACCTCCACCAC 4 550 
LTT TLQPKADA'AWHLHH 
CACACCCAAAACCAACCCCTCACCCACTTCGTCCTCTACTCCAGCGCCGC 4 600 
15 H-TQNQPLTH FVL YSSAA 

CGCCACCCTCGGCAGCCCCGGCCAAGCCAACTACGCCGCCGCCAACGCCT 4 650 

ATLGS PGQANYAAANA 
TCCTCGACGCCCTCGCCACCCACCGCCACACCCAAGGACAACCCGCCACC 4 700 
FLDALATH RHTQGQPAT 
20 ACCATCGCCTGGGGCATGTGGCACACCACCACCACACTCACCAGCCAACT 4 750 
TIAWGMWHTTTTLTSQL 
CACCGACAGCGACCGCGACCGCATCCGCCGCGGCGGCTTCCTGCCGATCT 4 800 

TDSDRDRIRRGGFLP I 
CGGACGACGAGGGCATGC 
25 S D D E G M 

Example 3 

Recombinant PKS Genes for 13-desmethoxy FK-506 and FK-520 
The present invention provides a variety of recombinant PKS genes in addition to 

30 those described in Examples 1 and 2 for producing 13-desmethoxy FK-506 and FK-520 
compounds. This Example provides the construction protocols for recombinant FK-520 
and FK-506 (from Streptomyces sp. MA6858 (ATCC 55098), described in U.S. Patent 
Nos. 5,1 16,756, incorporated herein by reference) PKS genes in which the module 8 AT 
coding sequences have been replaced by either the rapAT3 (the AT domain from module 

35 3 of the rapamycin PKS), rapATll, eryATl (the AT domain from module 1 of the 
erythromycin (DEBS) PKS), or eryATl coding sequences. Each of these constructs 
provides a PKS that produces the 13-desmethoxy- 13 -methyl derivative, except for the 
rapAT12 replacement, which provides the 13-desmethoxy derivative, i.e., it has a 
hydrogen where the other derivatives have methyl. 

40 Figure 7 shows the process used to generate the AT replacement constructs. 

First, a fragment of -4.5 kb containing module 8 coding sequences from the FK-520 
cluster of ATCC 14891 was cloned using the convenient restriction sites Sad and Sphl 
(Step A in Figure 7). The choice of restriction sites used to clone a 4.0 - 4.5 kb fragment 
comprising module 8 coding sequences from other FK-520 or FK-506 clusters can be 

45 different depending on the DNA sequence, but the overall scheme is identical. The 

unique Sad and Sphl restriction sites at the ends of the FK-520 module 8 fragment were 
then changed to unique Bgl II and Nsil sites by ligation to synthetic linkers (described in 
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the preceding Examples, see Step B of Figure 7). Fragments containing sequences 5' and 
3* of the AT8 sequences were then amplified using primers, described above, that 
introduced either an Avrll site or an Nhel site at two different KS/AT boundaries and an 
Xhol site at the AT/DH boundary (Step C of Figure 7). Heterologous AT domains from 
the rapamycin and erythromycin gene clusters were amplified using primers, as 
described above, that introduced the same sites as just described (Step D of Figure 7). 
The fragments were ligated to give hybrid modules with in-frame fusions at the KS/AT 
and AT/DH boundaries (Step E of Figure 7). Finally, these hybrid modules were ligated 
into the BamHl and Pstl sites of the KC51 5 vector. The resulting recombinant phage 
were used to transform the FK-506 and FK-520 producer strains to yield the desired 
recombinant cells, as described in the preceding Examples. 

The following table shows the location and sequences surrounding the engineered 
site of each of the heterologous AT domains employed. The FK-506 hybrid construct 
was used as a control for the FK-520 recombinant cells produced, and a similar FK-520 
hybrid construct was used as a control for the FK-506 recombinant cells. 



Heterologous AT 


Enzyme 


Location of Engineered Site 


FK-506 AT8 
(hydroxymalonyl) 


Avrll 
Nhel 
Xhol 


GGCCGTccqcqcCGTGCGGCGGTCTCGTCGTTC 
GRPRRAAVSSF 

ACCCAGCATCCCGCGATGGGTGAGCGqctcqcC 
TQHPAMGERLA 

TACGCCTTCCAGCGGCGGCCCTACTGGatcqaq 
YAFQRRPYWIE 


rapamycin AT3 
(methylmalonyl) 


Avrll 
Nhel 
Xhol 


GACCGGccccqtCGGGCGGGCGTGTCGTCCTTC 
DRPR RAGVSSF 

TGGCAGTGGCTGGGGATGGGCAGTGCcctqcqG 
WQWLGMGSALR 

TACGCCTTCCAACACCAGCGGTACTGGgtcqaq 
YAFQHQRYWVE 


rapamycin AT 12 
(malonyl) 


Avrll 
Nhel 
Xhol 


GGCCGAqcqcqcCGGGCAGGCGTGTCGTCCTTC 
G R A R R A G V S S F 

TCGCAGCGTGCTGGC ATGGGTGAGGAa c t qg cC 
SQRAGMGEELA 

TACGCCTTCCAGCACCAGCGCTACTGGc t cqaq 
YAFQHQRYWLE 


DEBS ATI 
(methylmalonyl) 


Avrll 
Nhel 
Xhol 


GCGCGAccqcqcCGGGCGGGGGTCTCGTCGTTC 
ARPRRAGVSSF 

TGGCAGTGGGCGGGCATGGCCGTCGAcctqctC 
WQWAGMAV DLL 

TACCCGTTCCAGCGCGAGCGCGTCTGGctcqaa 
YPFQRERVWLE 


DEBS AT2 
(methylmalonyl) 


Avrll 


GACGGGqtqcqcCGGGCAGGTGTGTCGGCGTTC 

DGVRRAGV SAF 
GCCCAGTGGGAAGGCATGGCGCGGGAg 1 1 g 1 1 G 
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Nhel 


AQWEGMA RELL 






TATCCTTTCCAGGGCAAGCGGTTCTGGctgctq 




Xhol 


YPFQGKR FWLL 
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The sequences shown below provide the location of the KS/AT boundaries 
chosen in the FK-520 module 8 coding sequences. Regions where AvrH and Nhel sites 
were engineered are indicated by lower case and underlining. 

CCGGCGCCGTCGAACTGCTGACGTCGGCCCGGCCGTGGCCCGAGACCGACCGG ccacqq C 
AGAVELLTSARPWPETDRPR 
GTGCCGCCGTCTCCTCGTTCGGGGTGAGCGGCACCAACGCCCACGTCATCCTGGAGGCCG 
RAAVS S FGVSGTNAHVI LEA 
GACCGGTAACGGAGACGCCCGCGGCATCGCCTTCCGGTGACCTTCCCCTGCTGGTGTCGG 
GP VTETPAAS PSGDL PLLVS 
CACGCTCACCGGAAGCGCTCGACGAGCAGATCCGCCGACTGCGCGCCTACCTGGACACCA 
ARS PEALDEQI RRLRAYLDT 
CCCCGGACGTCGACCGGGTGGCCGTGGCACAGACGCTGGCCCGGCGCACACACTTCGCCC 
TPDVDRV AVAQTLARRTHFA 
ACCGCGCCGTGCTGCTCGGTGACACCGTCATCACCACACCCCCCGCGGACCGGCCCGACG 
HRAV .LLGDTVITTPPADRPD 
AACTCGTCTTCGTCTACTCCGGCCAGGGCACCCAGCATCCCGCGATGGGCGAGC Aqctcq 
ELVFVYSGQGTQHPAMGEQL 
cCGCCGCCCATCCCGTGTTCGCCGACGCCTGGCATGAAGCGCTCCGCCGCCTTGACAACC 
AAAH P. V FADAWH EALRRLDN 

The sequences shown below provide the location of the AT/DH boundary chosen 
in the FK-520 module 8 coding sequences. The region where an Xhol site was 
engineered is indicated by lower case and underlining. 

TCCTCGGGGCTGGGTCACGGCACGACGCGGATGTGCCCGCGTACGCGTTCCAACGGCGGC 
I LGAGS RHDADVPAYAFQRR 
ACTACTGG atcgaq TCGGCACGCCCGGCCGCATCCGACGCGGGCCACCCCGTGCTGGGCT 
HYWIES. A R PAASDAG H PVLG 

The sequences shown below provide the location of the KS/AT boundaries 
chosen in the FK-506 module 8 coding sequences. Regions where Avrll and Nhel sites 
were engineered are indicated by lower case and underlining. 

TCGGCCAGGCCGTGGCCGCGGACCGGCCGT ccgcqc CGTGCGGCGGTCTCGTCGTTCGGG 

SARPWPRTGRPRRAAVS SFG 
GTGAGCGGCACCAACGCCCACATCATCCTGGAGGCCGGACCCGACCAGGAGGAGCCGTCG 

VSGTNAH I ILEAGPDQEEPS 
GCAGAACCGGCCGGTGACCTCCCGCTGCTCGTGTCGGCACGGTCCCCGGAGGCACTGGAC 

AEPAGDLPLLVSARS PEAL D 
GAGCAGATCGGGCGCCTGCGCGACTATCTCGACGCCGCCCCCGGCGTGGACCTGGCGGCC 
EQIGRLRDYLDAAPGVDLAA 
GTGGCGCGGACACTGGCCACGCGTACGCACTTCTCCCACCGCGCCGTACTGCTCGGTGAC 

VART LA TRT HF S HRAVLL GD 
ACCGTCATCACCGCTCCCCCCGTGGAACAGCCGGGCGAGCTCGTCTTCGTCTACTCGGGA 

T V I TAP PVEQPGELV FVYS G 
CAGGGCACCCAGCATCCCGCGATGGGTGAGCG qctcqc CGCAGCCTTCCCCGTGTTCGCC 

QGTQHPAMGERLAAAFPVFA 
GACCCGGACGTACCCGCCTACGCCTTCCAGCGGCGGCCCTACTGGATCGAGTCCGCGCCG 

DPDVPAYAFQRRPYWIESAP 

The sequences shown below provide the location of the AT/DH boundary chosen 
in the FK-506 module 8 coding sequences. The region where an Xhol site was 
engineered is indicated by lower case and underlining. 

GACCCGGACGTACCCGCCTACGCCTTCCAGCGGCGGCCCTACTGG atcqaq TCCGCGCCG 
DPDVPAY AFQRRPYWIESAP 
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Example 4 

Replacement of Methoxyl with Hydrogen or Methyl at CMS of FK-506 and FK-520 
The methods and reagents of the present invention also provide novel FK-506 
5 and FK-520 derivatives in which the methoxy group at C-15 is replaced by a hydrogen or 
methyl. These derivatives are produced in recombinant host cells of the invention that 
express recombinant PKS enzymes the produce the derivatives. These recombinant PKS 
enzymes are prepared in accordance with the methodology of Examples 1 and 2, with the 
exception that AT domain of module 7, instead of module 8, is replaced. Moreover, the 
10 present invention provides recombinant PKS enzymes in which the AT domains of both 
modules 7 and 8 have been changed. The table below summarizes the various 
compounds provided by the present invention. 





Compound 


C-13 


C-15 


Derivative Provided 


15 


FK-506 


hydrogen 


hydrogen 


13, 15-didesmethoxy-FK-506 




FK-506 


hydrogen 


methoxy 


1 3-desmethoxy-FK-506 




FK-506 


hydrogen 


methyl 


1 3, 1 5-didesmethoxy-l 5-methyl-FK-506 




FK-506 


methoxy 


hydrogen 


1 5-desmethoxy-FK-506 




FK-506 


methoxy 


methoxy 


Original Compound — FK-506 


20 


FK-506 


methoxy 


methyl 


1 5-desmethoxy- 1 5-methyl-FK-506 




FK-506 


methyl 


hydrogen 


1 3, 1 5-didesmethoxy-l 3-methyl-FK-506 




FK-506 


methyl 


methoxy 


1 3-desmethoxy-l 3-methyl-FK-506 




FK-506 


methyl 


methyl 


13,1 5-didesmethoxy- 13,1 5-dimethyl-FK-506 




FK-520 


hydrogen 


hydrogen 


13, 1 5-didesmethoxy FK-520 


25 


FK-520 


hydrogen 


methoxy 


13-desmethoxy FK-520 




FK-520 


hydrogen 


methyl 


13,1 5-didesmethoxy- 1 5-methyl-FK-520 




FK-520 


methoxy 


hydrogen 


1 5-desmethoxy-FK-520 




FK-520 


methoxy 


methoxy 


Original Compound » FK-520 




FK-520 


methoxy 


methyl 


1 5-desmethoxy- 1 5-methyl-FK-520 


30 


FK-520 


methyl 


hydrogen 


13,1 5-didesmethoxy-l 3-methyl-FK-520 




FK-520 


methyl 


. methoxy 


1 3 -desmethoxy- 1 3-methyl-FK-520 




FK-520 


methyl 


methyl 


1 3 , 1 5-didesmethoxy- 13,1 5-dimethyI-FK-520 



Example 5 

35 Replacement of Methoxyl with Ethyl at C-13 and/or C-15 of FK-506 and FK-520 
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The present invention also provides novel FK-506 and FK-520 derivative 
compounds in which the methoxy groups at either or both the C-13 and C-15 positions 
are instead ethyl groups. These compounds are produced by novel PKS enzymes of the 
invention in which the AT domains of modules 8 and/or 7 are converted to ethylmalonyl 
5 specific AT domains by modification of the PKS gene that encodes the module. 

Ethylmalonyl specific AT domain coding sequences can be obtained from, for example, 
the FK-520 PKS genes, the riiddamycin PKS genes, and the tylosin PKS genes. The 
novel PKS genes of the invention include not only those in which either or both of the 
AT domains of modules 7 and 8 have been converted to ethylmalonyl specific AT 
10 domains but also those in which one of the modules is converted to an ethylmalonyl 

specific AT domain and the other is converted to a malonyl specific or a methylmalonyl 
specific AT domain. 



Example 6 

15 Neurotrophic Compounds 

The compounds described in Examples 1-4, inclusive have immunosuppressant 
activity and can be employed as immunosuppressants in a manner and in formulations 
similar to those employed for FK-506. The compounds of the invention are generally 
effective for the prevention of organ rejection in patients receiving organ transplants and 

20 in particular can be used for immunosuppression following orthotopic liver 

transplantation. These compounds also have pharmacokinetic properties and metabolism 
that are more advantageous for certain applications relative to those of FK-506 or FK- 
520. These compounds are also neurotrophic; however, for use as neurotrophins, it is 
desirable to modify the compounds to diminish or abolish their immunosuppressant 

25 activity. This can be readily accomplished by hydroxylating the compounds at the C- 18 
position using established chemical methodology or novel FK-520 PKS genes provided 
by the present invention. 

Thus, in one aspect, the present invention provides a method for stimulating 
nerve growth that comprises administering a therapeutically effective dose of 18- 

30 hydroxy-FK-520. In another embodiment, the compound administered is a C-18,20- 

dihydroxy-FK-520 derivative. In another embodiment, the compound administered is a 
C-13-desmethoxy and/or C-15-desmethoxy 1 8-hydroxy-FK-520 derivative. In another 
embodiment, the compound administered is a C-13-desmethoxy and/or C-15- 
desmethoxy 18,20-dihydroxy-FK-520 derivative. In other embodiments, the compounds 

35 are the corresponding analogs of FK-506. The 18-hydroxy compounds of the invention 
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can be prepared chemically, as described in U.S. Patent No. 5,189,042, incorporated 
herein by reference, or by fermentation of a recombinant host cell provided by the 
present invention that expresses a recombinant PKS in which the module 5 DH domain 
has been deleted or rendered non-functional. 

The chemical methodology is as follows. A compound of the invention (-200 
mg) is dissolved in 3 mL of dry methylene chloride and added to 45 fiL of 2,6-lutidine, 
and the mixture stirred at room temperature. After 10 minutes, tert-butyldimethylsilyl 
trifluoromethanesulfonate (64 jiL) is added by syringe. After 15 minutes, the reaction 
mixture is diluted with ethyl acetate, washed with saturated bicarbonate, washed with 
brine, and the organic phase dried over magnesium sulfate. Removal of solvent in vacuo 
and flash chromatography on silica gel (ethyl acetate: hexane (1:2) plus 1% methanol) 
gives the protected compound, which is dissolved in 95% ethanol (2.2 mL) and to which 
is added 53 jiL of pyridine, followed by selenium dioxide (58 mg). The flask is fitted 
with a water condenser and heated to 70°C on a mantle. After 20 hours, the mixture is 
cooled to room temperature, filtered through diatomaceous earth, and the filtrate poured 
into e. saturated sodium bicarbonate solution. This is extracted with ethyl acetate, and the 
organic phase is washed with brine and dried over magnesium sulfate. The solution is 
concentrated and purified by flash chromatography on silica gel (ethyl acetate: hexane 
(1:2) plus 1% methanol) to give the protected 18-hydroxy compound. This compound is 
dissolved in acetonitrile and treated with aqueous HF to remove the protecting groups. 
After dilution with ethyl acetate, the mixture is washed with saturated bicarbonate and 
brine, dried over magnesium sulfate, filtered, and evaporated to yield the 18-hydroxy 
compound. Thus, the present invention provides the C-18-hydroxyl derivatives of the 
compounds described in Examples 1-4. 

Those of skill in the art will recognize that other suitable chemical procedures can 
be used to prepare the novel 18-hydroxy compounds of the invention. See, e.g., Kawai et 
a/., Jan. 1993, Structure-activity profiles of macrolactam immunosuppressant FK-506 
analogues, FEBS Letters 376(2): 107-1 13, incorporated herein by reference. These 
methods can be used to prepare both the C18-[5]-OH and C18-[i?]-OH enantiomers, with 
the R enantiomer showing a somewhat lower ICso, which may be preferred in some 
applications. See Kawai et al. 9 supra. Another preferred protocol is described in Umbreit 
and Sharpless, 1977, JACS 99(1 6): 1 526-28, although it may be preferable to use 30 
equivalents each of Se0 2 and t-BuOOH rather than the 0.02 and 3-4 equivalents, 
respectively, described in that reference. 
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All scientific and patent publications referenced herein are hereby incorporated 
by reference. The invention having now been described by way of written description 
and example, those of skill in the art will recognize that the invention can be practiced in 
a variety of embodiments, that the foregoing description and example is for purposes of 
5 illustration and not limitation of the following claims. 
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Claims 

1. An isolated nucleic acid that encodes a CoA ligase, a non-ribosomal peptide 
synthetase, or a domain of an extender module of a polyketide synthase enzyme that 
synthesizes FK-520. 

2. The isolated nucleic acid of claim 1 that encodes an extender module, said 
module comprising a ketosynthase domain, an acyl transferase domain, and an acyl 
carrier protein domain. 

3. The isolated nucleic acid of claim 1 that encodes an open reading frame, said 
open reading frame comprising coding sequences for two or more extender modules, 
each extender module comprising a ketosynthase domain, an acyl transferase domain, 
and an acyl carrier protein domain. 

4. The isolated nucleic acid of claim 1 that encodes a gene cluster, said gene 
cluster comprising two or more open reading frames, each of said open reading frames 
comprising coding sequences for two or more extender modules, each of said extender 
modules comprising a ketosynthase domain, an acyl transferase domain, and an acyl 
carrier protein domain. 

5. The isolated nucleic acid of claim 2, wherein at least one of said domains is a 
domain of a module of a non-FK-520 polyketide synthase. 

6. The isolated nucleic acid of claim 1, wherein said nucleic acid is a 
recombinant vector capable of replication in or integration into the chromosome of a host 
cell. 

7. The isolated nucleic acid of claim 6 that is selected from the group consisting 
of cosmid pKOS034-120, cosmid pKOS034-124, cosmid pKOS065-M27, and cosmid 
pKOS065-M21. 

8. The isolated nucleic acid of claim 5, wherein said non-FK-520 polyketide 
synthase is rapamycin polyketide synthase, FK-506 polyketide synthase, or erythromcyin 
polyketide synthase. 
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9. A method of preparing a polyketide, said method comprising transforming a 
host cell with a recombinant DNA vector of claim 6, and culturing said host cell under 
conditions such that said polyketide synthase is produced and catalyzes synthesis of said 
polyketide. 

10. The method of claim 9, wherein said host cell is a Streptomyces host cell. 

11. The method of claim 9, wherein said polyketide is selected from the group 
consisting of FK-520, 13-desmethoxy-FK-520, and 13-desmethoxy-FK-506. 

12. A recombinant host cell that expresses a recombinant polyketide synthase 
selected from the group consisting of: (i) an FK-520 polyketide synthase in which at 
least one AT domain is replaced by an AT domain of a non-FK-520 polyketide synthase; 
(ii) an FK-506 polyketide synthase in which at least one AT domain is replaced by an 
AT domain of a non-FK-506 polyketide synthase; (iii) an FK-520 polyketide synthase in 
which at least one DH domain has been deleted; (iv) an FK-506 polyketide synthase in 
which at least one DH domain has been deleted. 

13. The recombinant host cell of claim 12 that expresses an FK-520 polyketide 
synthase in which an AT domain of module 8 has been replaced by an AT domain that 
binds malonyl CoA, methylmalonyl CoA, or ethylmalonyl CoA. 

14. The recombinant host cell of claim 12 that expresses an FK-506 polyketide 
synthase in which an AT domain of module 8 has been replaced by an AT domain that 
binds malonyl CoA, methylmalonyl CoA, or ethylmalonyl CoA. 

15 . The recombinant host cell of claim 13, wherein a DH domain of module 5 or 
module 6 has been deleted. 

16. The recombinant host cell of claim 14, wherein a DH domain of module 5 or 
module 6 has been deleted 

17. A recombinant host cell that comprises recombinant genes coding for 
enzymes sufficient for synthesis of ethylmalonyl CoA or 2-hydroxymalonyl CoA. 
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18. A polyketide having the structure 
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5 wherein, R\ is hydrogen, methyl, ethyl, or allyl; R 2 is hydrogen or hydroxyl, provided 
that when R 2 is hydrogen, there is a double bond between C-20 and C-19; R3 is hydrogen 
or hydroxyl; R4 is methoxyl, hydrogen, methyl, or ethyl; and R 5 is methoxyl, hydrogen, 
methyl, or ethyl; but not including FK-506, FK-520, 1 8-hydroxy-FK-520, and 18- 
hydroxy-FK-506. 

0 

19. The polyketide of claim 18 that is 13-desmethoxy-FK-506. 

20. The polyketide of claim 18 that is 13-desmethoxy-18-hydroxy-FK-520. 
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POLYKETIDE SYNTHASE ENZYMES AND RECOMBINANT DNA CONSTRUCTS 

THEREFOR 



Field of the Invention 
The present invention relates to polyketides and the polyketide synthase (PKS) 
enzymes that produce them. The invention also relates generally to genes encoding PKS 
enzymes and to recombinant host cells containing such genes and in which expression of 
such genes leads to the production of polyketides. The present invention also relates to 
compounds useful as medicaments having immunosuppressive and/or neurotrophic activity. 
Thus, the invention relates to the fields of chemistry, molecular biology, and agricultural, 
medical, and veterinary technology. 

Background of the Invention 

Polyketides are a class of compounds synthesized from 2-carbon units through a 
series of condensations and subsequent modifications. Polyketides occur in many types of 
organisms, including fungi and mycelial bacteria, in particular, the actinomycetes. 
Polyketides are biologically active molecules with a wide variety of structures, and the class 
encompasses numerous compounds with diverse activities. Tetracycline, erythromycin, 
epothilone, FK-506, FK-520, narbomycin, picromycin, rapamycin, spinocyn, and tylosin are 
examples of polyketides. Given the difficulty in producing polyketide compounds by 
traditional chemical methodology, and the typically low production of polyketides in wild- 
type cells, there has been considerable interest in finding improved or alternate means to 
produce polyketide compounds. 

This interest has resulted in the cloning, analysis, and manipulation by recombinant 
DNA technology of genes that encode PKS enzymes. The resulting technology allows one 
to manipulate a known PKS gene cluster either to produce the polyketide synthesized by 
that PKS at higher levels than occur in nature or in hosts that otherwise do not produce the 
polyketide. The technology also allows one to produce molecules that are structurally 
related to, but distinct from, the polyketides produced from known PKS gene clusters. See, 
e.g., PCT publication Nos. WO 93/13663; 95/08548; 96/40968; 97/02358; 98/27203; and 
98/49315; United States Patent Nos. 4,874,748; 5,063,155; 5,098,837; 5,149,639; 
5,672,491; 5,712,146; 5,830,750; and 5,843,718; andFuef al. 9 1994, Biochemistry 33: 
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9321-9326; McDaniel et al., 1993, Science 262: 1546-1550; and Rohr, 1995, Angew. Chem. 
Int. Ed. Engl. 34(8): 881-888, each of which is incorporated herein by reference. 

Polyketides are synthesized in nature by PKS enzymes. These enzymes, which are 
complexes of multiple large proteins, are similar to the synthases that catalyze condensation 
5 of 2-carbon units in the biosynthesis of fatty acids. PKSs catalyze the biosynthesis of 

polyketides through repeated, decarboxylase Claisen condensations between acylthioester 
building blocks. The building blocks used to form complex polyketides are typically 
acylthioesters, such as acetyl, butyryl, propionyl, malonyl, hydroxymalonyl, 
methylmalonyl, and ethylmalonyl CoA. Other building blocks include amino acid like 
10 acylthioesters. PKS enzymes that incorporate such building blocks include an activity that 
functions as an amino acid ligase (an AMP ligase) or as a non-ribosomal peptide synthetase 
(NRPS). Two major types of PKS enzymes are known; these differ in their composition and 
mode of synthesis of the polyketide synthesized. These two major types of PKS enzymes 
are commonly referred to as Type I or "modular" and Type II "iterative" PKS enzymes. 
15 In the Type I or modular PKS enzyme group, a set of separate catalytic active sites 

(each active site is termed a "domain", and a set thereof is termed a "module") exists for 
each cycle of carbon chain elongation and modification in the polyketide synthesis pathway. 
The typical modular PKS is composed of several large polypeptides, which can be 
segregated from amino to carboxy termini into a loading module, multiple extender 
20 modules, and a releasing (or thioesterase) domain. The PKS enzyme known as 6- 

deoxyerythronolide B synthase (DEBS) is a Type I PKS. In DEBS, there is a loading 
module, six extender modules, and a thioesterase (TE) domain. The loading module, six 
extender modules, and TE of DEBS are present on three separate proteins (designated 
DEBS-1, DEBS-2, and DEBS-3, with two extender modules per protein). Each of the 
25 DEBS polypeptides is encoded by a separate open reading frame (ORF) or gene; these 

genes are known as eryAI, eryAII, and eryAIIL See Caffrey et al. 9 1992, FEBS Letters 304: 
205, and U.S. Patent No. 5,824,513, each of which is incorporated herein by reference. 

Generally, the loading module is responsible for binding the first building block 
used to synthesize the polyketide and transferring it to the first extender module. The 
30 loading module of DEBS consists of an acyltransferase (AT) domain and an acyl carrier 

protein (ACP) domain. Another type of loading module utilizes an inactivated ketosynthase 
(KS) domain and AT and ACP domains. This inactivated KS is in some instances called 
KS Q , where the superscript letter is the abbreviation for the amino acid, glutamine, that is 
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present instead of the active site cysteine required for ketosynthase activity. In other PKS 
enzymes, including the FK-506 PKS, the loading module incorporates an unusual starter 
unit and is composed of a CoA ligase like activity domain. In any event, the loading module 
recognizes a particular acyl-CoA (usually acetyl or propionyl but sometimes butyryl or 
5 other acyl-CoA) and transfers it as a thiol ester to the ACP of the loading module. 

The AT on each of the extender modules recognizes a particular extender-CoA 
(malonyl or alpha-substituted malonyl, i.e., methylmalonyl, ethylmalonyl, and 2- 
hydroxymalonyl) and transfers it to the ACP of that extender module to form a thioester. 
Each extender module is responsible for accepting a compound from a prior module, 
10 binding a building block, attaching the building block to the compound from the prior 
module, optionally performing one or more additional functions, and transferring the 
resulting compound to the next module. 

Each extender module of a modular PKS contains a KS, AT, ACP, and zero, one, 
two, or three domains that modify the beta-carbon of the growing polyketide chain. A 
15 typical (non-loading) minimal Type I PKS extender module is exemplified by extender 

module three of DEBS, which contains a KS domain, an AT domain, and an ACP domain. 
These three domains are sufficient to activate a 2-carbon extender unit and attach it to the 
growing polyketide molecule. The next extender module, in turn, is responsible for 
attaching the next building block and transferring the growing compound to the next 
20 extender module until synthesis is complete. 

Once the PKS is primed with acyl- and malonyl-ACPs, the acyl group of the loading 
module is transferred to form a thiol ester (trans-esterification) at the KS of the first 
extender module; at this stage, extender module one possesses an acyl-KS and a malonyl (or 
substituted malonyl) ACP. The acyl group derived from the loading module is then 
25 covalently attached to the alpha-carbon of the malonyl group to form a carbon-carbon bond, 
driven by concomitant decarboxylation, and generating a new acyl-ACP that has a backbone 
two carbons longer than the loading building block (elongation or extension). 

The polyketide chain, growing by two carbons each extender module, is sequentially 
passed as covalently bound thiol esters from extender module to extender module, in an 
30 assembly line-like process. The carbon chain produced by this process alone would possess 
a ketone at every other carbon atom, producing a polyketone, from which the name 
polyketide arises. Most commonly, however, additional enzymatic activities modify the beta 
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keto group of each two carbon unit just after it has been added to the growing polyketide 
chain but before it is transferred to the next module. 

Thus, in addition to the minimal module containing KS, AT, and ACP domains 
necessary to form the carbon-carbon bond, and as noted above, other domains that modify 
the beta-carbonyl moiety can be present. Thus, modules may contain a ketoreductase (KR) 
domain that reduces the keto group to an alcohol. Modules may also contain a KR domain 
plus a dehydratase (DH) domain that dehydrates the alcohol to a double bond. Modules may 
also contain a KR domaih, a DH domain, and an enoylreductase (ER) domain that converts 
the double bond product to a saturated single bond using the beta carbon as a methylene 
function. An extender module can also contain other enzymatic activities, such as, for 
example, a methylase or dimethylase activity. 

After traversing the final extender module, the polyketide encounters a releasing 
domain that cleaves the polyketide from the PKS and typically cyclizes the polyketide. For 
example, final synthesis of 6-dEB is regulated by a TE domain located at the end of 
extender module six. In the synthesis of 6-dEB, the TE domain catalyzes cyclization of the 
macrolide ring by formation of an ester linkage. In FK-506, FK-520, rapamycin, and similar 
polyketides, the TE activity is replaced by a RapP (for rapamycin) or RapP like activity that 
makes a linkage incorporating a pipecolate acid residue. The enzymatic activity that 
catalyzes this incorporation for the rapamycin enzyme is known as RapP, encoded by the 
rapP gene. The polyketide can be modified further by tailoring enzymes; these enzymes add 
carbohydrate groups or methyl groups, or make other modifications, i.e., oxidation or 
reduction, on the polyketide core molecule. For example, 6-dEB is hydroxylated at C-6 and 
C-12 and glycosylated at C-3 and C-5 in the synthesis of erythromycin A. 

In Type I PKS polypeptides, the order of catalytic domains is conserved. When all 
beta-keto processing domains are present in a module, the order of domains in that module 
from N-to-C-terminus is always KS, AT, DH, ER, KR, and ACP. Some or all of the beta- 
keto processing domains may be missing in particular modules, but the order of the domains 
present in a module remains the same. The order of domains within modules is believed to 
be important for proper folding of the PKS polypetides into an active complex. Importantly, 
there is considerable flexibility in PKS enzymes, which allows for the genetic engineering 
of novel catalytic complexes. The engineering of these enzymes is achieved by modifying, 
adding, or deleting domains, or replacing them with those taken from other Type I PKS 
enzymes. It is also achieved by deleting, replacing, or adding entire modules with those 
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taken from other sources. A genetically engineered PKS complex should of course have the 
ability to catalyze the synthesis of the product predicted from the genetic alterations made. 

Alignments of the many available amino acid sequences for Type I PKS enzymes 
has approximately defined the boundaries of the various catalytic domains. Sequence 
5 alignments also have revealed linker regions between the catalytic domains and at the N- 
and C-termini of individual polypeptides. The sequences of these linker regions are less 
well conserved than are those for the catalytic domains, which is in part how linker regions 
are identified. Linker regions can be important for proper association between domains and 
between the individual polypeptides that comprise the PKS complex. One can thus view the 
10 linkers and domains together as creating a scaffold on which the domains and modules are 
positioned in the correct orientation to be active. This organization and positioning, if 
retained, permits PKS domains of different or identical substrate specificities to be 
substituted (usually at the DNA level) between PKS enzymes by various available 
methodologies. In selecting the boundaries of, for example, an AT replacement, one can 
1 5 thus make the replacement so as to retain the linkers of the recipient PKS or to replace them 
with the linkers of the donor PKS AT domain, or, preferably, make both constructs to 
ensure that the correct linker regions between the KS and AT domains have been included 
in at least one of the engineered enzymes. Thus, there is considerable flexibility in the 
design of new PKS enzymes with the result that known polyketides can be produced more 
20 effectively, and novel polyketides useful as pharmaceuticals or for other purposes can be 
made. 

By appropriate application of recombinant DNA technology, a wide variety of 
polyketides can be prepared in a variety of different host cells provided one has access to 
nucleic acid compounds that encode PKS proteins and polyketide modification enzymes. 

25 The present invention helps meet the need for such nucleic acid compounds by providing 

recombinant vectors that encode the FK-520 PKS enzyme and various FK-520 modification 
enzymes. Moreover, while the FK-506 and FK-520 polyketides have many useful activities, 
there remains a need for compounds with similar useful activities but with better 
pharmacokinetic profile and metabolism and fewer side-effects. The present invention helps 

30 meet the need for such compounds as well. . - 

Summary of the Invention 
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In one embodiment, the present invention provides recombinant DNA vectors that 
encode all or part of the FK-520 PKS enzyme. Illustrative vectors of the invention include 
cosmid pKOS034-120, pKOS034-124, pKOS065-C31, pKOS065-C3, pKOS065-M27, and 
pKOS065-M21. The invention also provides nucleic acid compounds that encode the 
various domains of the FK-520 PKS, i.e., the KS, AT, ACP, KR, DH, and ER domains. 
These compounds can be readily used, alone or in combination with nucleic acids encoding 
other FK-520 or non-FK-520 PKS domains, as intermediates in the construction of 
recombinant vectors that encode all or part of PKS enzymes that make novel polyketides. 

The invention also provides isolated nucleic acids that encode all or part of one or 
more modules of the FK-520 PKS, each module comprising a ketosynthase activity, an acvl 
transferase activity, and an acyl carrier protein activity. The invention provides an isolated 
nucleic acid that encodes one or more open reading frames of FK-520 PKS genes, said open 
reading frames comprising coding sequences for a CoA ligase activity, an NRPS activity, or 
two or more extender modules. The invention also provides recombinant expression vectors 
containing these nucleic acids. 

In another embodiment, the invention provides isolated nucleic acids that encode all 
or a part of a PKS that contains at least one module in which at least one of the domains in 
the module is a domain from a non-FK-520 PKS and at least one domain is from the FK- 
520 PKS. The non-FK-520 PKS domain or module originates from the rapamycin PKS, the 
FK-506 PKS, DEBS, or another PKS. The invention also provides recombinant expression 
vectors containing these nucleic acids. 

In another embodiment, the invention provides a method of preparing a polyketide, 
said method comprising transforming a host cell with a recombinant DNA vector that 
encodes at least one module of a PKS, said module comprising at least one FK-520 PKS 
domain, and culturing said host cell under conditions such that said PKS is produced and 
catalyzes synthesis of said polyketide. In one aspect, the method is practiced with a 
Streptomyces host cell. In another aspect, the polyketide produced is FK-520. In another 
aspect, the polyketide produced is a polyketide related in structure to FK-520. In another 
aspect, the polyketide produced is a polyketide related in structure to FK-506 or rapamycin. 

In another embodiment, the invention provides a set of genes in recombinant form 
sufficient for the synthesis of ethylmalonyl CoA in a heterologous host celL These genes 
and the methods of the invention enable one to create recombinant host cells with the ability 
to produce polyketides or other compounds that require ethylmalonyl CoA for biosynthesis. 
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The invention also provides recombinant nucleic acids that encode AT domains specific for 
ethylmalonyl CoA. Thus, the compounds of the invention can be used to produce 
polyketides requiring ethylmalonyl CoA in host cells that otherwise are unable to produce 
such polyketides. 

In another embodiment, the invention provides, a set of genes in recombinant form 
sufficient for the synthesis of 2-hydroxymalonyl CoA and 2-methoxymalonyl CoA in a 
heterologous host cell. These genes and the methods of the invention enable one to create 
recombinant host cells with the ability to produce polyketides or other compounds that 
require 2-hydroxymalonyl CoA for biosynthesis. The invention also provides recombinant 
nucleic acids that encode AT domains specific for 2-hydroxymalonyl CoA and 2- 
methoxymalonyl CoA. Thus, the compounds of the invention can be used to produce 
polyketides requiring 2-hydroxymalonyl CoA or 2-methoxymalonyl CoA in host cells that 
are otherwise unable to produce such polyketides. 

In another embodiment, the invention provides a compound related in structure to 
FK-520 or FK-506 that is useful in the treatment of a medical condition. These compounds 
include compounds in which the C-13 methoxy group is replaced by a moiety selected from 
the group consisting of hydrogen, methyl, and ethyl moieties. Such compounds are less 
susceptible to the main in vivo pathway of degradation for FK-520 and FK-506 and related 
compounds and thus exhibit an improved pharmacokinetic profile. The compounds of the 
invention also include compounds in which the C-15 methoxy group is replaced by a moiety 
selected from the group consisting of hydrogen, methyl, and ethyl moieties. The compounds 
of the invention also include the above compounds further modified by chemical 
methodology to produce derivatives such as, but not limited to, the C-18 hydroxyl 
derivatives, which have potent neurotrophin but not immunosuppresion activities. 
Thus, the invention provides polyketides having the structure: 
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wherein, R, is hydrogen, methyl, ethyl, or allyl; R 2 is hydrogen or hydroxyl, provided that 
when R 2 is hydrogen, there is a double bond between C-20 and C-19; R3 is hydrogen or 
hydroxyl; R4 is methoxyl, hydrogen, methyl, or ethyl; and R 5 is methoxyl, hydrogen, 
5 methyl, or ethyl; but not including FK-506, FK-520, 1 8-hydroxy-FK-520, and 1 8-hydroxy- 
FK-506. The invention provides these compounds in purified form and in pharmaceutical 
compositions. 

In another embodiment, the invention provides a method for treating a medical 
condition by administering a pharmaceutical^ efficacious dose of a compound of the 
10 invention. The compounds of the invention may be administered to achieve 
immunosuppression or to stimulate nerve growth and regeneration. 

These and other embodiments and aspects of the invention will be more fully - 
understood after consideration of the attached Drawings and their brief description below, 
together with the detailed description, examples, and claims that follow. 

15 

Brief Description of the Drawings 
Figure 1 shows a diagram of the FK-520 biosynthetic gene cluster. The top line 
provides a scale in kilobase pairs (kb). The second line shows a restriction map with 
selected restriction enzyme recognition sequences indicated. K is Kpnl; X is Xhol, S is Sacl\ 

20 P is PstI; and E is EcoRl. The third line indicates the position of FK-520 PKS and related 
genes. Genes are abbreviated with a one letter designation, i.e., C is fkbC. Immediately 
under the third line are numbered segments showing where the loading module (L) and ten 
different extender modules (numbered 1 - 10) are encoded on the various genes shown. At 
the bottom of the Figure, the DNA inserts of various cosmids of the invention (i.e., 34-124 

25 is cosmid pKOS034-124) are shown in alignment with the FK-520 biosynthetic gene 
cluster. 

Figure 2 shows the loading module (load), the ten extender modules, and the peptide 
synthetase domain of the FK-520 PKS, together with, on the top line, the genes that encode 
the various domains and modules. Also shown are the various intermediates in FK-520 
30 biosynthesis, as well as the structure of FK-520, with carbons 13, 15, 21, and 31 numbered. 
The various domains of each module and subdomains of the loading module are also shown. 
The darkened circles showing the DH domains in modules 2, 3, and 4 indicate that the 
dehydratase domain is not functional as a dehydratase; this domain may affect the 
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stereochemistry at the corresponding position in the polyketide. The substituents on the FK- 
520 structure that result from the action of non-PKS enzymes are also indicated by arrows, 
together with the types of enzymes or the genes that code for the enzymes that mediate the 
action. Although the methyltransferase is shown acting at the C-13 and C- 15 hydroxy 1 
groups after release of the polyketide from the PKS, the methyltransferase may act on the 2- 
hydroxymalonyl substrate prior to or contemporaneously with its incorporation during 
polyketide synthesis. 

Figure 3 shows a close-up view of the left end of the FK-520 gene cluster, which 
contains at least ten additional genes. The ethyl side chain on carbon 21 of FK-520 (Figure 
2) is derived from an ethylmalonyl Co A extender unit that is incorporated by an 
ethylmalonyl specific AT domain in extender module 4 of the PKS. At least four of the 
genes in this region code for enzymes involved in ethylmalonyl biosynthesis. The 
polyhydroxybutyrate depolymerase is involved in maintaining hydroxybutyryl-CoA pools 
during FK-520 production. Polyhydroxybutyrate accumulates during vegetative growth and 
disappears during stationary phase in other Streptomyces (Ranade and Vining, 1993, Can. J. 
Microbiol. 39:377). Open reading frames with unknown function are indicated with a 
question mark. 

Figure 4 shows a biosynthetic pathway for the biosynthesis of ethylmalonyl CoA 
from acetoacetyl CoA consistent with the function assigned to four of the genes in the FK- 
520 gene cluster shown in Figure 3. 

Figure 5 shows a close-up view of the right-end of the FK-520 PKS gene cluster 
(and of the sequences on cosmid pKOS065-C31). The genes shown includefkbD.fkbM (a 
methyl transferase that methylates the hydroxyl group on C-3 1 of FK-520),JkbN (a 
homolog of a gene described as a regulator of cholesterol oxidase and that is believed to be 
a transcriptional activator), fkbQ (a type II thioesterase, which can increase polyketide 
production levels), and fkbS (a crotonyl-CoA reductase involved in the biosynthesis of 
ethylmalonyl CoA). 

. Figure 6 shows the proposed degradative pathway for tacrolimus (FK-506) 
metabolism. 

Figure 7 shows a schematic process for the construction of recombinant PKS genes 
of the invention that encode PKS enzymes that produce 13-desmethoxy FK-506 and FK- 
520 polyketides of the invention, as described in Example 4, below. 
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Figure 8, in Parts A and B, shows certain compounds of the invention preferred for 
dermal application in Part A and a synthetic route for making those compounds in Part B. 

Detailed Description of the Invention 
Given the valuable.pharmaceutical properties of polyketides, there is a need for 
methods and reagents for producing large quantities of polyketides, as well as for producing 
related compounds not found in nature. The present invention provides such methods and 
reagents, with particular application to methods and reagents for producing the polyketides 
known as FK-520, also known as ascomycin or L-683,590 (see Holt et al. 7 1993, JACS 
V/J:9925), and FK-506, also known as tacrolimus. Tacrolimus is a macrolide 
immunosuppressant used to prevent or treat rejection of transplanted heart, kidney, liver, 
lung, pancreas, and small bowel allografts. The drug is also useful for the prevention and 
treatment of graft-versus-host disease in patients receiving bone marrow transplants, and for 
the treatment of severe, refractory uveitis. There have been additional reports of the 
unapproved use of tacrolimus for other conditions, including alopecia universalis, 
autoimmune chronic active hepatitis, inflammatory bowel disease, multiple sclerosis, 
primary biliary cirrhosis, and scleroderma. The invention provides methods and reagents for 
making novel polyketides related in structure to FK-520 and FK-506, and structurally 
related polyketides such as rapamycin. 

The FK-506 and rapamycin polyketides are potent immunosuppressants, with 
chemical structures shown below. 




FK-506 , Rapamycin 
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FK-520 differs from FK-506 in that it lacks the allyl group at C-21 of FK-506, having 
instead an ethyl group at that position, and has similar activity to FK-506, albeit reduced 
immunosuppressive activity. 

These compounds act through initial formation of an intermediate complex with 
5 protein "immunophilins" known as FKBPs (FK-506 binding proteins), including FKBP-12. 
Immunophilins are a class of cytosolic proteins that fonn complexes with molecules such as 
FK-506, FK-520, and rapamycin that in turn serve as ligands for other cellular targets 
involved in signal transduction. Binding of FK-506, FK-520, and rapamycin to FKBP 
occurs through the structurally similar segments of the polyketide molecules, known as the 
) "FKBP-binding domain" (as generally but not precisely indicated by the stippled regions in 
the structures above). The FK-506-FKBP complex then binds calcineurin, while the 
rapamycin-FKBP complex binds to a protein known as RAFT-1. Binding of the FKBP- 
polyketide complex to these second proteins occurs through the dissimilar regions of the 
drugs known as the "effector" domains. 




^ ^ Immunosuppression 




The three component FKBP-polyketide-effector complex is required for signal 
transduction and. subsequent immunosuppressive activity of FK-506; FK-520, and 
rapamycin. Modifications in the effector domains of FK-506, FK-520, and rapamycin that 
destroy binding to the effector proteins (calcineurin or RAFT) lead to loss of 
immunosuppressive activity, even though FKBP binding is unaffected. Further, such 
analogs antagonize the immunosuppressive effects of the parent polyketides, because they 
compete for FKBP^ Such non-immunosuppressive analogs also show reduced toxicity (see 
Dumont et a/., 1992, Journal of Experimental Medicine 176, 751-760), indicating that much 
of the toxicity of these drugs is not linked to FKBP binding. 
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In addition to immunosuppressive activity, FK-520, FK-506, and rapamycin have 
neurotrophic activity. In the central nervous system and in peripheral nerves, immunophilins 
are referred to as "neuroimmunophilins". The neuroimmunophilin FKBP is markedly 
enriched in the central nervous system and in peripheral nerves. Molecules that bind to the 
5 neuroimmunophilin FKBP, such as FK-506 and FK-520, have the remarkable effect of 
stimulating nerve growth. In vitro, they act as neurotrophic, i.e., they promote neurite 
outgrowth in NGF-treated PC 12 cells and in sensory neuronal cultures, and in intact 
animals, they promote regrowth of damaged facial and sciatic nerves, and repair lesioned 
serotonin and dopamine neurons in the brain. See Gold et aL, Jun. 1999, J. Pharm. Exp, 

10 Ther. 289(3): 1202-1210; Lyons et aL, 1994, Proc. National Academy of Science 91: 3191- 
3195; Gold et aL y 1995, Journal of Neuroscience 15: 7509-7516; and Steiner et al. y 1997, 
Proc. National Academy of Science 94: 2019-2024. Further, the restored central and 
peripheral neurons appear to be functional. 

Compared to protein neurotrophic molecules (BNDF, NGF, etc.), the small- 

1 5 molecule neurotrophins such as FK-506, FK-520, and rapamycin have different, and often 
advantageous, properties. First, whereas protein neurotrophins are difficult to deliver to 
their intended site of action and may require intra-cranial injection, the small-molecule 
neurotrophins display excellent bioavailability; they are active when administered 
subcutaneously and orally. Second, whereas protein neurotrophins show quite specific 

20 effects, the small-molecule neurotrophins show rather broad effects. Finally, whereas 
protein neurotrophins often show effects on normal sensory nerves, the small-molecule 
neurotrophins do not induce aberrant sprouting of normal neuronal processes and seem to 
affect damaged nerves specifically. Neuroimmunophilin ligands have potential therapeutic 
utility in a variety of disorders involving nerve degeneration (e.g. multiple sclerosis, 

25 Parkinson's disease, Alzheimer's disease, stroke, traumatic spinal cord and brain injury, 
peripheral neuropathies). 

Recent studies have shown that the immunosuppressive and neurite outgrowth 
activity of FK-506, FK-520, and rapamycin can be separated; the neuroregenerative activity 
in the absence of immunosuppressive activity is retained by agents which bind to FKBP but 

30 not to the effector proteins calcineurin or RAFT. See Steiner et aL, 1997, Nature Medicine 
3: 421-428. 
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Nerve Regeneration 



Available structure-activity data show that the important features for neurotrophic 
activity of rapamycin, FK-520, and FK-506 lie within the common, contiguous segments of 
the macroiide ring that bind to FKBP. This portion of the molecule is termed the "FKBP 
binding domain" (see VanDuyne et al. y 1993, Journal of Molecular Biology 229: 105-124.). 
Nevertheless, the effector domains of the parent macrolides contribute to conformational 
rigidity of the binding domain and thus indirectly contribute to FKBP binding. 




"FKBP binding domain" 

There are a number of other reported analogs of FK-506, FK-520, and rapamycin that bind 
1 0 to FKBP but not the effector protein calcineurin or RAFT. These analogs show effects on 
nerve regeneration without immunosuppressive effects. 

Naturally occurring FK-520 and FK-506 analogs include the antascomycins, which 
are FK-506-like macrolides that lack the functional gronps of FK-506 that bind to 
calcineurin (see Fehr et aL, 1996, The Journal of Antibiotics 49: 230-233). These molecules 
1 5 bind FKBP as effectively as does FK-506; they antagonize the effects of both FK-506 and 
rapamycin, yet lack immunosuppressive activity. 
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Antascomycin A 

Other analogs can be produced by chemically modifying FK-506, FK-520, or 
rapamycin. One approach to obtaining neuroimmunophilin ligands is to destroy the effector 
binding region of FK-506, FK-520, or rapamycin by chemical modification. While the 
chemical modifications permitted on the parent compounds are quite limited, some useful 
chemically modified analogs exist. The FK-520 analog L-685,81 8 (ED 50 = 0.7 nM for 
FKBP binding; see Dumont et al. 9 1992), and the rapamycin analog WAY- 124,466 (IC 50 = 
12.5 nM; see Ocain et al. 9 1993, Biochemistry Biophysical Research Communications 192: 
1340-134693) are about as effective as FK-506, FK-520, and rapamycin at promoting 
neurite outgrowth in sensory neurons (see Steiner et aL, 1997). 




L-685 f 81 8 WAY-1 24 f 466 



One of the few positions of rapamycin that is readily amenable to chemical 
modification is the allylic 16-methoxy group; this reactive group is readily exchanged by 
acid-catalyzed nucleophilic substitution. Replacement of the 16-methoxy group of 
rapamycin with a variety of bulky groups has produced analogs showing selective loss of 
immunosuppressive activity while retaining FKBP-binding (see Luengo et aL, 1995, 
Chemistry & Biology 2: 471-481). One of the best compounds, 1, below, shows complete 
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loss of activity in the splenocyte proliferation assay with only a 10-fold reduction in binding 
to FKBP. 




5 There are also synthetic analogs of FKBP binding domains. These compounds 

reflect an approach to obtaining neuroimmunophilin ligands based on "rationally designed" 
molecules that retain the FKBP-binding region in an appropriate conformation for binding 
to FKBP, but do not possess the effector binding regions. In one example, the ends of the 
FKBP binding domain were tethered by hydrocarbon chains (see Holt et al. y 1993, Journal 
10 of the American Chemical Society J 15: 9925-9938); the best analog, 2, below, binds to 
FKBP about as well as FK-506. In a similar approach, the ends of the FKBP binding 
domain were tethered by a tripeptide to give analog 3, below, which binds to FKBP about 
20-fold poorer than FK-506. These compounds are anticipated to have neuroimmunophilin 
binding activity. 




2 3 
In a primate MPTP model of Parkinson's disease, administration of FKBP ligand 
GPI- 1046 caused brain cells to regenerate and behavioral measures to improve. MPTP is a 
neurotoxin, which, when administered to animals, selectively damages nigral-striatal 
20 dopamine neurons in the brain, mimicking the damage caused by Parkinson's disease. 
Whereas, before treatment, animals were unable to use affected limbs, the FKBP ligand 
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restored the ability of animals to feed themselves and gave improvements in measures of 
locomotor activity, neurological outcome, and fine motor control. There were also 
corresponding increases in regrowth of damaged nerve terminals. These results demonstrate 
the utility of FKBP ligands for treatment of diseases of the CNS. 
5 From the above description, two general approaches towards the design of non- 

immunosuppressant, neuroimmunophilin ligands can be seen. The first involves the 
construction of constrained cyclic analogs of FK-506 in which the FKBP binding domain is 
fixed in a conformation optimal for binding to FKBP. The advantages of this approach are 
that the conformation of the analogs can be accurately modeled and predicted by 

10 computational methods, and the analogs closely resemble parent molecules that have proven 
pharmacological properties. A disadvantage is that the difficult chemistry limits the 
numbers and types of compounds that can be prepared. The second approach involves the 
trial and error construction of acyclic analogs of the FKBP binding domain by conventional 
medicinal chemistry. The advantages to this approach are that the chemistry is suitable for 

1 5 production of the numerous compounds needed for such interactive chemistry-bioassay 
approaches. The disadvantages are that the molecular types of compounds that have 
emerged have no known history of appropriate pharmacological properties, have rather 
labile ester functional groups, and are too conformationally mobile to allow accurate 
prediction of conformational properties. 

20 The present invention provides useful methods and reagents related to the first 

approach, but with significant advantages. The invention provides recombinant PKS genes 
that produce a wide variety of polyketides that cannot otherwise be readily synthesized by 
chemical methodology alone. Moreover, the present invention provides polyketides that 
have either or both of the desired immunosuppressive and neurotrophic activities, some of 

25 which are produced only by fermentation and others of which are produced by fermentation 
and chemical modification. Thus, in one aspect, the invention provides compounds that 
optimally bind to FKBP but do not bind to the effector proteins. The methods and reagents 
of the invention can be used to prepare numerous constrained cyclic analogs of FK-520 in 
which the FKBP binding domain is fixed in a conformation optimal for binding to FKBP. 

30 Such compounds will show neuroimmunophilin binding (neurotrophic) but not 

immunosuppressive effects. The invention also allows direct manipulation of FK-520 and 
related chemical structures via genetic engineering of the enzymes involved in the 
biosynthesis of FK-520 (as well as related compounds, such as FK-506 and rapamycin); 
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similar chemical modifications are simply not possible because of the complexity of the 
structures. The invention can also be used to introduce "chemical handles" into normally 
inert positions that permit subsequent chemical modifications. 

Several general approaches to achieve the development of novel neuroimmunophilin 
ligands are facilitated by the methods and reagents of the present invention. One approach is 
to make "point mutations" of the functional groups of the parent FK-520 structure that bind 
to the effector molecules to eliminate their binding potential. These types of structural 
modifications are difficult to perform by chemical modification, but can be readily 
accomplished with the methods and reagents of the invention. 

A second," more extensive approach facilitated by the present invention is to utilize 
molecular modeling to predict optimal structures ab initio that bind to FKBP but not 
effector molecules. Using the available X-ray crystal structure of FK-520 (or FK-506) 
bound to FKBP, molecular modeling can be used to predict polyketides that should 
optimally bind to FKBP but not calcineurin. Various macrolide structures can be generated 
by linking the ends of the FKBP-binding domain with "all possible" polyketide chains of 
variable length and substitution patterns that can be prepared by genetic manipulation of the 
FK-520 or FK-506 PKS gene cluster in accordance with the methods of the invention. The 
ground state conformations of the virtual library can be determined, and compounds that 
possess binding domains most likely to bind well to FKBP can be prepared and tested. 

Once a compound is identified in accordance with the above approaches, the 
invention can be used to generate a focused library of analogs around the lead candidate, to 
"fine tune" the compound for optimal properties. Finally, the genetic engineering methods 
of the invention can be directed towards producing "chemical handles" that enable 
medicinal chemists to modify positions of the molecule previously inert to chemical 
modification. This opens the path to previously prohibited chemical optimization of lead 
compounds by time-proven approaches. 

Moreover, the present invention provides polyketide compounds and the . 
recombinant genes for the PKS enzymes that produce the compounds that have significant 
advantages over FK-506 and FK-520 and their analogs. The metabolism and 
pharmacokinetics of tacrolimus has been exstensively studied, and FK-520 is believed to be 
similar in these respects. Absorption of tacrolimus is rapid, variable, and incomplete from 
the gastrointestinal tract (Harrison's Principles of Internal Medicine, 14th edition, 1998, 
McGraw Hill, 14, 20, 21, 64-67). The mean bioavailability of the oral dosage form is 27%, 
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(range 5 to 65%). The volume of distribution (VolD) based on plasma is 5 to 65 L per kg of 
body weight (L/kg), and is much higher than the VolD based on whole blood 
concentrations, the difference reflecting the binding of tacrolimus to red blood cells. Whole 
blood concentrations may be 12 to 67 times the plasma concentrations. Protein binding is 
high (75 to 99%), primarily to albumin and alphal-acid glycoprotein. The half-life for 
distribution is 0.9 hour; elimination is biphasic and variable: terminal-1 1.3 hr (range, 3.5 to 
40.5 hours). The time to peak concentration is 0.5 to 4 hours after oral administration. 

Tacrolimus is metabolized primarily by cytochrome P450 3A enzymes in the liver 
and small intestine. The drug is extensively metabolized with less than 1% excreted 
unchanged in urine. Because hepatic dysfunction decreases clearance of tacrolimus, doses 
have to be reduced substantially in primary graft non-function, especially in children. In 
addition, drugs that induce the cytochrome P450 3A enzymes reduce tacrolimus levels, 
while drugs that inhibit these P450s increase tacrolimus levels. Tacrolimus bioavailability 
doubles with co-administration of ketoconazole, a drug that inhibits P450 3A. See, Vincent 
et aL, 1992, In vitro metabolism of FK-506 in rat, rabbit, and human liver microsomes: 
Identification of a major metabolite and of cytochrome P450 3A as the major enzymes 
responsible for its metabolism, Arch. Biochem. Biophys. 294: 454-460; Iwasaki et al. y 1993, 
Isolation, identification, and biological activities of oxidative metabolites of FK-506, a 
potent immunosuppressive macrolide lactone, Drug Metabolism & Disposition 21: 971-977; 
Shiraga et al. 9 1994, Metabolism of FK-506, a potent immunosuppressive agent, by 
cytochrome P450 3 A enzymes in rat, dog. and human liver microsomes, Biochem. 
Pharmacol 47: 727-735; and Iwasaki et al. y 1995, Further metabolism of FK-506 
(Tacrolimus); Identification and biological activities of the metabolites oxidized at multiple 
sites of FK-506, Drug Metabolism & Disposition 23: 28-34. The cytochrome P450 3 A 
subfamily of isozymes has been implicated as important in this degradative process. 

Structures of the eight isolated metabolites formed by liver microsomes are shown in 
Figure 6. Four metabolites of FK-506 involve demethylation of the oxygens on carbons 13, 
15, and 31, and hydroxylation of carbon 12. The 13-demethylated (hydroxy) compounds 
undergo cyclizations of the 13-hydroxy at C-10 to give MI, MVI and MVII, and the 12- 
hydroxy metabolite at C-10 to give I. Another four metabolites formed by oxidation of the 
four metabolites mentioned above were isolated by liver microsomes from dexamethasone 
treated rats. Three of these are metabolites doubly demethylated at the methoxy groups on 
carbons 15 and 31 (M-V), 13 and 31 (M-VI), and 13 and 15 (M-VII). The fourth, M-VIII, 
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was the metabolite produced after demethylation of the 3 1 -methoxy group, followed by 
formation of a fused ring system by further oxidation. Among the eight metabolites, M-II 
has immunosuppressive activity comparable to that of FK-506, whereas the other 
metabolites exhibit weak or negligible activities. Importantly, the major metabolite of 
5 human, dog, and rat liver microsomes is the 13-demethylated and cyclized FK-506 (M-I). 

Thus, the major metabolism of FK-506 proceeds via 13 -demethylation followed by 
cyclization to the inactive M-I, this representing about 90% of the metabolic products after a 
10 minute incubation with liver microsomes. Analogs of tacrolimus that do not possess a C- 
13 methoxy group would not be susceptible to the first and most important 
10 biotransformation in the destructive metabolism of tacrolimus (i.e. cyclization of 13- 

hydroxy to C-10). Thus, a 13-desmethoxy analog of FK-506 should have a longer half-life 
in the body than does FK-506. The C-13 methoxy group is believed not to be required for 
binding to FKBP or calcineurin. The C-13 methoxy is not present on the identical position 
of rapamycin, which binds to FKBP with equipotent affinity as tacrolimus. Also, analysis of 
15 the 3-dimensional structure of the FKBP-tacrolimus-calcineurin complex shows that the C- 
13 methoxy has no interaction with FKBP and only a minor interaction with calcineurin. 
The present invention provides C- 13-desmethoxy analogs of FK-506 and FK-520, as well 
as the recombinant genes that encode the PKS enzymes that catalyze their synthesis and 
host cells that produce the compounds. 
20 These compounds exhibit, relative to their naturally occurring counterparts, 

prolonged immunosuppressive action in vivo, thereby allowing a lower dosage and/or 
reduced frequency of administration. Dosing is more predictable, because the variability in 
FK-506 dosage is largely due to variation of metabolism rate. FK-506 levels in blood can 
vary widely depending on interactions with drugs that induce or inhibit cytochrome P450 
25 3 A (summarized in USP Drug Information for the Health Care Professional). Of particular 
importance are the numerous drugs that inhibit or compete for CYP 3 A, because they 
increase FK-506 blood levels and lead to toxicity (Prograf package insert, FujisawaGUS, 
Rev 4/97, Rec 6/97). Also important are the drugs that induce P450 3A (e.g. 
Dexamethasone), because they decrease FK-506 blood levels and reduce efficacy. Because 
30 the major site of CYP 3 A action on FK-506 is removed in the analogs provided by the 

present invention, those analogs are not as susceptible to drug interactions as the naturally 
occurring compounds. 
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Hyperglycemia, nephrotoxicity, and neurotoxicity are the most significant adverse 
effects resulting from the use of FK-506 and are believed to be similar for FK-520. Because 
these effects appear to occur primarily by the same mechanism as the immunosuppressive 
action (i.e. FKBP-calcineurin interaction), the intrinsic toxicity of the desmethoxy analogs 
5 may be similar to FK-506. However, toxicity of FK-506 is dose related and correlates with 
high blood levels of the drug (Prograf package insert, FujisawaGUS, Rev 4/97, Rec 6/97). 
Because the levels of the compounds provided by the present invention should be more 
controllable, the incidence of toxicity should be significantly decreased with the 13- 
desmethoxy analogs. Some reports show that certain FK-506 metabolites are more toxic 

10 than FK-506 itself, and this provides an additional reason to expect that a CYP 3 A resistant 
analog can have lower toxicity and a higher therapeutic index. 

Thus, the present invention provides novel compounds related in structure to FK- 
506 and FK-520 but with improved properties. The invention also provides methods for 
making these compounds by fermentation of recombinant host cells, as well as the 

1 5 recombinant host cells, the recombinant vectors in those host cells, and the recombinant 
proteins encoded by those vectors. The present invention also provides other valuable 
materials useful in the construction of these recombinant vectors that have many other 
important applications as well. In particular, the present invention provides the FK-520 PKS 
genes, as well as certain genes involved in the biosynthesis of FK-520 in recombinant form. 

20 FK-520 is produced at relatively low levels in the naturally occurring cells, 

Streptomyces hygroscopicus var. ascomyceticus, in which it was first identified. Thus, 
another benefit provided by the recombinant FK-520 PKS and related genes of the present 
invention is the ability to produce FK-520 in greater quantities in the recombinant host cells 
provided by the invention. The invention also provides methods for making novel FK-520 

25 analogs, in addition to the desmethoxy analogs described above, and derivatives in 
recombinant host cells of any origin. 

The biosynthesis of FK-520 involves the action of several enzymes. The FK-520 
PKS enzyme, which is composed of the fkbA,fkbB y JkbC y and fkbP gene products, 
synthesizes the core structure of the molecule. There is also a hydroxy lation at C-9 mediated 

30 by the P450 hydroxylase that is the fkbD gene product and that is oxidized by the JkbO gene 
product to result in the formation of a keto group at C-9. There is also a methylation at C-31 
that is mediated by an O-methy transferase that is the jkbM gene product. There are also 
methylations at the C-13 and C-15 positions by a methyl transferase believed to be encoded 
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by the fkbG gene; this methyltransferase may act on the hydroxymalonyl CoA substrates 
prior to binding of the substrate to the AT domains of the PKS during polyketide synthesis. 
The present invention provides the genes encoding these enzymes in recombinant form. The 
invention also provides the genes encoding the enzymes involved in ethylmalonyl CoA and 
5 2-hydroxymalonyl CoA biosynthesis in recombinant form. Moreover, the invention 

provides Streptomyces hygroscopicus var. as corny ceticus recombinant host cells lacking 
one or more of these genes that are useful in the production of useful compounds. 

The cells are useful in production in a variety of ways. First, certain cells make a 
useful FK-520-related compound merely as a result of inactivation of one or more of the 

10 FK-520 biosynthesis genes. Thus, by inactivating the C-31 O-methyl transferase gene in 
Streptomyces hygroscopicus var. ascomyceticus, one creates a host cell that makes a 
desmethyl (at C-31) derivative of FK-520. Second, other cells of the invention are unable to 
make FK-520 or FK-520 related compounds due to an inactivation of one or more of the 
PKS genes. These cells are useful in the production of other polyketides produced by PKS 

1 5 enzymes that are encoded on recombinant expression vectors and introduced into the host 
cell. 

Moreover, if only one PKS gene is inactivated, the ability to produce FK-520 or an 
FK-520 derivative compound is restored by introduction of a recombinant expression vector 
that contains the functional gene in a modified or unmodified form. The introduced gene 

20 produces a gene product that, together with the other endogenous and functional gene 

products, produces the desired compound. This methodology enables one to produce FK- 
520 derivative compounds without requiring that all of the genes for the PKS enzyme be 
present on one or more expression vectors. Additional applications and benefits of such 
cells and methodology will be readily apparent to those of skill in the art after consideration 

25 of how the recombinant genes were isolated and employed in the construction of the 
compounds of the invention. 

The FK-520 biosynthetic genes were isolated by the following procedure. Genomic 
DNA was isolated from Streptomyces hygroscopicus var. ascomyceticus (ATCC 14891) 
using the lysozyme/proteinase K protocol described in Genetic Manipulation of 

30 Streptomyces - A Laboratory Manual (Hopwood et a!., 1986). The average size of the DNA 
was estimated to be between 80 - 120 kb by electrophoresis on 0.3% agarose gels. A library 
was constructed in the SuperCos™ vector according to the manufacturer's instructions and 
with the reagents provided in the commercially available kit (Stratagene). Briefly, 100 jig of 
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genomic DNA was partially digested with 4 units of Sau3A I for 20 min. in a reaction 
volume of 1 mL, and the fragments were dephosphorylated and ligated to SuperCos vector 
arms. The ligated DNA was packaged and used to infect log-stage XLl-BlueMR cells. A 
library of about 10,000 independent cosmid clones was obtained. 
5 Based on recently published sequence from the FK-506 cluster (Motamedi and 

Shafiee, 1998, Eur. J. Biochem. 256: 528), a probe for the fkbO gene was isolated from 
ATCC 14891 using PCR with degenerate primers. With this probe, a cosmid designated 
pKOS034-124 was isolated from the library. With probes made from the ends of cosmid 
pKOS034-124, an additional cosmid designated pKOS034-120 was isolated. These cosmids 
10 (pKOS034-124 and pKOS034-120) were shown to contain DNA inserts that overlap with 
one another. Initial sequence data from these two cosmids generated sequences similar to 
sequences from the FK-506 and rapamycin clusters, indicating that the inserts were from the 
FK-520 PKS gene cluster. Two EcoKl fragments were subcloned from cosmids pKOS034- 
124 and pKOS034-120. These subclones were used to prepare shotgun libraries by partial 
15 digestion with &zz/3AI, gel purification of fragments between 1.5 kb and 3 kb in size, and 
ligation into the pLitmus28 vector (New England Biolabs). These libraries were sequenced 
using dye terminators on a Beckmann CEQ2000 capillary electrophoresis sequencer, 
according to the manufacturer's protocols. 

To obtain cosmids containing sequence on the left and right sides of the sequenced 
20 region described above, a new cosmid library of ATCC 14891 DNA was prepared 

essentially as described above. This new library was screened with a new fkbM probe 
isolated using DNA from ATCC 14891. A probe representing the fkbP gene at the end of 
cosmid pKOS034-124 was also used. Several additional cosmids to the right of the 
previously sequenced region were identified. Cosmids pKOS065-C31 and pKOS065-C3 
25 were identified and then mapped with restriction enzymes. Initial sequences from these 
cosmids were consistent with the expected organization of the cluster in this region. More 
extensive sequencing showed that both cosmids contained in addition to the desired 
sequences, other sequences not contiguous to the desired sequences on the host cell 
chromosomal DNA. Probing of additional cosmid libraries identified two additional 
30 cosmids, pKOS065-M27 and pKOS065-M21 , that contained the desired sequences in a 
contiguous segment of chromosomal DNA. Cosmids pKOS034-124, pKOS034-120, 
pKOS065-M27, and pKOS065-M21 have been deposited with the American Type Culture 
Collection, Manassas, VA, USA. The complete nucleotide sequence of the coding 
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sequences of the genes that encode the proteins of the FK-520 PKS are shown below but 
can also be determined from the cosmids of the invention deposited with the ATCC using 
standard methodology. 

Referring to Figures 1 and 3, the FK-520 PKS gene cluster is composed of four open 
reading frames designated JkbB,fkbC,fkbA, and fkbP. The flcbB open reading frame encodes 
the loading module and the first four extender modules of the PKS. The fkbC open reading 
frame encodes extender modules five and six of the PKS. The fkbA open reading frame 
encodes extender modules seven, eight, nine, and ten of the PKS. The flcbP open reading 
frame encodes the NRPS of the PKS. Each of these genes can be isolated from the cosmids 
of the invention described above. The DNA sequences of these genes are provided below 
preceded by the following table identifying the start and stop codons of the open reading 
frames of each gene and the modules and domains contained therein. 



Nucleotides 

1 5 complement (41 2 - 1 836) 
complement (2020-3579) 
complement (3969 - 4496) 
complement (4595 - 5488) 
5601 - 6818 

20 6808-8052 
8156-8824 

complement (9122 - 9883) 
complement (9894 - 10994) 
complement (10987 - 1 1247) 

25 complement ( 1 1 244 - 1 2092) 
complement (121 13 - 13150) 
complement (13212 - 23988) 
complement (23992 - 46573) 
46754 - 47788 

30 47785 - 52272 
52275 - 71465 
71462 - 72628 
72625 - 73407 

complement (73460 - 76202) 
35 complement (76336 - 77080) 
complement (77076 - 77535) 
complement (44974 - 46573) 
complement (43777 - 44629) 
complement (43144 - 43660) 
40 complement (4 1 842 - 43093) 
complement(40609 - 41842) 
complement (39442 - 40609) 
complement (38677 - 39307) 
complement "(3 83 71 - 38581) 



Gene or Domain 
fkbW 
JkbV 
jkbR2 
JkbRl 
flcbE 
JkbF 
flcbG 
flcbH 
flcbl 
jkbJ 
fkbK 
flcbL 
flcbC 
flcbB 
flcbO 
flcbP 
flcbA 
flcbD 
flcbM 
flcbN 
JkbQ 
flcbS 

Co A ligase of loading domain 

ER of loading domain 

ACP of loading domain 

KS of extender module 1 (KS1) 

ATI 

DH1 

KR1 

ACPI 
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complement (37145 • 


■ 38296) 


KS2 




complement (35749 ■ 


- 37144) 


AT2 




complement (34606 • 


• 35749) 


DH2 (inactive) 




complement (33823 ■ 


■ 34480) 


KR2 


5 


complement (33505 - 


■33715) 


ACP2 




complement (32185 - 


33439) 


KS3 




complement (31018 - 


32185) 


AT3 




complement (29869 - 


31018) 


DH3 (inactive) 




complement (29092 - 


29740) 


KR3 


10 


complement (28750 - 


28960) 


ACP3 




complement (27430 - 


28684) 


KS4 




complement (26146 - 


27430) 


AT4 




complement (24997 - 


26146) 


DH4 (inactive) 




complement (24163 - 


24373) 


ACP4 


15 


complement (22653 - 


23892) 


KS5 




complement (21420 - 


22653) 


AT5 




complement (20241 - 


21420) 


DH5 




complement (19464 - 


20097) 


KR5 




complement (191 16 - 


19326) 


ACP5 


20 


complement (17820 - 


19053) 


KS6 




complement (16587 - 


17820) 


AT6 




complement (15438 - 


16587) 


DH6 




complement (14517 - 


15294) 


ER6 




complement (13761 - 


14394) 


KR6 


25 


complement (13452 - 


13662) 


ACP6 




52362 - 53576 




KS7 




53577 - 54716 




AT7 




54717-55871 




DH7 




56019-56819 




ER7 


30 


56943 - 57575 




KR7 




57710-57920 




ACP7 




57990 - 59243 




KS8 




59244 - 60398 




AT8 




60399-61412 




DH8 (inactive) 


35 


61548-62180 




KR8 




62328 - 62537 




ACP8 




62598 - 63854 




KS9 




63855 - 65084 




AT9 




65085 - 66254 




DH9 


40 


66399-67175 




ER9 




67299-67931 




KR9 




68094 - 68303 




ACP9 




68397 -.69653 




KS10 




69654 - 70985 




AT10 


45 


71064 - 71273 




ACP10 



I GATCTCAGGC ATGAAGTCCT CCAGGCGAGG CGCCGAGGTG GTGAACACCT CGCCGCTGC7 

61 TGTACGGACC ACTTCAGTCA GCGGCGATTG CGGAACCAAG TCATCCGGAA TAAAGGGCGG 

121 TTACAAGATC CTCACATTGC GCGACCGCCA GCATACGCTG AGTTGCCTCA GAGGCAAACC 

50 131 GAAAGGGCGC GGGCGGTCCG CACCAGGGCG GAGTACGCGA CGAGAGTGGC GCACCCGCGC 
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241 ACCGTCACCT CTCTCCCCCG CCGGCGGGAT GCCCGG-CGTG ACACGGTTGG GCTCTCCTCG 
301 ACGCTGAACA CCCGCGCGGT GTGGCGTCGG GGACACCGCC TGGCATCGGC CGGGTGACGG 
361 TACGGGGAGG GCGTACGGCG GCCGTGGCTC GTGC7CACGG CCGCCGGGCG GTCATCCGTC 
421 GAGACGGCAC TCGGCGAGCA GGGACGCCTG GTCGGCACCT SCGGGCCGGA CGACCGTGTG 
5 4 81 G77CGCGGGC GGGCGGTGGC CGGTGGTGAG CCAGC7C7CC AGGGCGGTGA AGGCTGAGCG 

541 G7GACACGGC AGCAAAGGCC GGAGTCGGTC GGGGAAGGTG TCGACGAGGG CGTCGGTGTG 
601 CG7GCCG7CC TCGATGCGGT AGTAGCGGTA CCGGCC3CCA GGCCGCTGCC GGACATACGC 
661 GCGTACACGT CGGAGCCCGG GCGGCAGGCA GCAGCACGTC GAGAGTGCCT GGATGGTGAT 
721 CAGCGGCTTG CCGATACGAC CGGTCAACGC GATGCGTTCC ACGGCCGCGT GGACGCCGGA 
10 781 GGAGCGGGTG GCGTAGTCGT AGTCGGCATC GCAGCCGGGG ACCGTCCCCG GGGCGCAATA 

84 1 CGGTGTGCCG GCTTCCTTCT CCCCATCGAA GCCGGGGTCG AACTCCTCGC GGTAGACGCG 
901 CTGCGTCAGA TCCCAGTAGA CCTCGTGGTG GTACGGCCAC AAGAACTCGG AGTCGGCCGG 
961 GAACCCGGCG CGGAGC AG CG CCTCGCGCGC CTGGCCGGCT GCGGGGCCGC CTGCCGCGTA 
1021 GGTGGGGTAG TCGCGCAGGG CGGCCGGCAG GAAGGTGAAG AGGTTGGGAC CCTCCGCGCG 
15 1081 CCACAGGGTG CCTTCCCAGT CGACTCCTCC GTCGTACAGC TCGGGATGGT TCTCCAGCTG 

1141 CCAGCGCACG AGGTAGCCGC CGTTGGACAT CCCGGTGACC AGGGTGCGCT CGAGCGGCCG 
1201 GTGGTAGCGC TGGGCGACCG ACGCGCGGGC GGCCCGGGTC AGCTGGGTGA GGCGGGTGTT 
1261 CCACTCGGCG ACGGCGTCGC CCGGCCGGGA GCCATCACGG TAGAACGCGG GGCCGGTGTT 
1321 GCCCTTGTCG GTGGCGGCGT AGGCGTAACC GCGGGCGAGC ACCCAGTCGG CGATGGCCCG 
20 1381 GTCGTTGGCG TACTGCTCGC GGTTACCGGG GGTGCCGGCC ACGACCAGGC CACCGTTCCA 

14 41 GCGGTCGGGC AGCCGGATGA CGAACTGGGC GTCGTGGTTC CACCCGTGGT TGGTGTTGGT 
1501 GG7GGAGGTG TCGGGGAAGT AGCCGTCGAT CTGGA7CCCG GGCACTCCGG TGGGAGTGGC 

15 61 CAGGTTCTTG GGCGTCAGCC CTGCCCAG7C CGCCGGG7CG GTGTGGCCGG TGGCCGCCGT 
1621 TCCCGCCGTG GTCAGCTCGT CCAGGCAGTC GGCCTGCTGA CGTGCCGCCG CCGGGACACG 

25 1681 CAGCTGGGAC AGACGGGCGC AGTGACCGTC CGGGGCATCG GGAGCAGGCC GGGCCGTGGC 

1741 CGGTGAGGGG AG C AG G AC G G CGACTGCGGC CAGGG7GAGA GCGCCGAGGC CGGTGCGTCT 
1801 TC7CGGGGCC CGTCCGACAC CGAGGGGCAG AACCA7GGAG AGCCTCCAGA CGTGCGGATG 
18 61 GA7GACGGAC TGGAGGCTAG GTCGCGCACG GTGGAGACGA ACATGGGTGC GCCCGCCATG 
1921 ACTGAGGCCC CTCAGAGGTG GGCCGCCGCC ATGACGGGCG CGGGACCGCG GGCGCTCCGG 

30 1981 GGCGGTGCCC GCGGCCGCCA CCGGTTCCGG GTCCCCGGGT CAGGGACAGG TGTCGTTCGC 
2041 GACGGTGAAG TAGCCGGTCG GCGACTCTTT CAAGGTGGTC GTGACGAAGG TGTTGTACAG 
2101 GCCCATGTTC TGGCCGGAGC CCTTGGCGTA GGTGTAACCG GCGCTCGTCG TGGCGCGGCC 
2161 CGCCTGGACG TGAGCGTAGT TGCCGGCGGT CCAGCAGACG GCCGTGGCAC CGGTCGTCTG 
2221 CGCGGTGACC GCGCCCGAGA GCGGTCGGGC CTTGCCGTCC GCGTCCCGGG CGGCGACCGC 

35 2281 GTAGGTGTGC GATGTGCCCG CCCTCAGGCC GGTGTCCGTG TACGACGTCG TGGCGGACGT 
2341 GGTGATCTGG GCACCGTCGC GGTGGACGGC GTAGTCGGTG GCGCCGTCGA CGGGTTTCCA 
24 01 GG7CAGGCTG ATGGTGGTGT CGGTGGCGCC GGTGGCGGCC AGGCCGGACG GAGCGGGCAG 
24 61 CGAACCGGGG TCGGAGGCGG ATCCGCTCAG GCCGAAGAAC TGCGTGATCC AGTAGCTGGA 
2521 ACAGATCGAG TCCAGGAAGT AGGCGGCGCC GGTGC7GCCG CACTGCTGTG CTCCGGTGCC 

40 2581 GGGATCGACC GGGGTGCCG7 GCCCGATGCC CGGCACCCGG TTCACCTCCA CGGCCACCGA 
2 641 TCCGTCCGCG GCCAGGTACT CCTCGTGCCG GGTGGAGTTC GGGCCGATCA CCGAGGTACG 
2701 GTCCGGCGTC TGGGACACGC CGTGCACAGC GGTCCACTGG TCGCGCAACT CGTCGGCGTT 
27 61 GCGCGGCGCG ACGGTGGTG7 CCTTGTCGCC GTGCCAGATG GCCACGCGCG GCCACGGGCC 
2821 CGACCACGAG GGGTAGCCGT CACGGACCCG CCGCGCCCAC TGGTCCGCGG TCAGGTCGGT 

45 2881 CCCGGGGTTC ATGCACAGGT ACGCGCTGCT GACG7CGGTG GCACAGCCGA . AGGGCAGGCC 
2941 GGCGACGACC GCGCCGGCCT. GGAAGACGTC CGGATAGGTG GCGAGCATCA CCGACGTCAT 
.3001 GGCACCGCCG GCGGACAGCC CGGTGATGTA GGTGCGCTGG GGGTCCGCGC CGTAGGCGGA 
3061 GACGGTGTGA GCGGCCATCT GCCGGATCGA CGCGGCTTCG CCCTGGCCCC TGCGGTTGTC 
3121 GC7GCTCTGG AACCAGTTGA AGCACCTGTT CGCGTTGTTC GACGACGTGG TCTCGGCGAA 

50 3181 CACGAGCAGG AAGCCATAGC GGTCCGCGAA TGAGAGCAGG CCGGAGTTGT CGGCGTAGCC - 
3241 CTGGGCGTCC TGGGTGCAAC CGTGCAGGGC GAACACCACC GCCGGCTCCG CGGGCAGGGA 
3301 CGCGGGCCGG TAGACGTACA TGTTCAGCCG GCCCGGGTTC GTGCCGAAGT CCGCGACCTC 
3361 GG7CAGGTCC GCCTTGGTCA GACCGGGCTT GGCCAGGCCC GCCGCGGCGT GGGCCGTCGG 
3421 CGCCGGGCCG AGCAGGGCCG CTCCGAGTAC GAGGGCCACG ACGGCCACGA GACGGGTGAG 

55 34 81 CACCCCCCGC CGTCCCGGAC GCGACAACGA CCCGACCGGC GGCGAGGAGG AGAGGGGGAA" " 
3541 CAGCGGGGTG AGGATTCCCC GGAACGGCGG CGGCTGCATG GCGGCTCCCT CGATGTCGTG 
3601 GGGGGGACAC GGAGGGCTCC CTGACGTCGA TCAG7GGGAG CGCCCCGGTG CCCGGC^CCG 
3661 TAGGGGTGGT TCAACCCGCA ACGGTATGGC CCGGAGCACC ACACCCCGCA CCGCGCGATG 
3721 TGCGCCCGGA CGGATTGTGT CGCCTTGCGG AATCTGATAC CCGGACGCGA CGAACGCCCC 

60 3781 ACCCGACACG GGTAGGGCGT CATGGTGTCC GACTCGGCCG GTCGGCCTTG CCTGCCCTGG 
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3841 ACGGACCGGG CGTCGGCGGA CCGGGCGTCG GCGGGCTGGG CGGTATGGCG GCCGAGGACG 

3 901 CCAGCCGCG7 GGGGCGGCCG CGCCCAAGTG CAGTACGCCG ACCGTGGCCG GCGGGAGGGC 
3961 CGGACCGGTC AGTGCAGTCC CGCGGCCCTG CGGGACCGCT CGTCCCAGAC GGGTTCCACC 

4 021 GCGGCGAACC GGGGTCCGTG TCCGCGGCGG TAGACCATCA GTGTCCGCTC GAAGGTGA~G 
4 081 ACGATGACAC CGTCCTGGTT GTAGCCGATG GTGCGCACGC TGATGATGCC TACGTCAGGT 
4141 CGGCTGGCGG ACTCCCGGGT GTTCAGGACC TCGGACTGCG AGTAGATGGT GTCGCCCTCG 
4 201 AAGACCGGGT TCGGCAGCCT GACCCGGTCC CAGCCGAGGT TGGCCATCAC ATGCTGGGAG 
4 2 61 ATGTCGGTGA CGCTC7GCCC GGTGACCAGG GCGAGGGTGA AGGTGGAGTC CACCAGCGGC 
4 321 TTGCCCCAGG TGGTGCCCGC CGAGTAGTGG CGGTCGAAGT GCAGCGGCGC GGTGTTCTGC 
4 381 GTCAGGAGCG TGAGCCAGGA GTTGTCGGTC TCCAGGACCG TGCGGCCCAG GGGGTGGCGG 
4 441 TACACGTCGC CGGTGGTGAA GTCCTCGAAG TAGCGGCCCT GCCAGCCCTC GACCACAGCG 
4 501 GTGCGGGTGG CGTCCTGGTC CGGGTTCTCA GTCGTCATGG CGCTCATTCT GGGAAGTCCC 
4 5 61 CGGTCCGCTG TGAAATGCCG AACCTTCACC GGGCTCATAC GTGCGGCGCA TGAGCCCTGG 
4 621 ACCGTACGTA GTCGTAGAAC CTCGCCACCA CTGGCGCGCG TGGTCCTCCG GCGAGTGTGA 

15 4 681 CCACGCCGAC CGTGCGCCGC GCCTGCGGGT CGTCGAGCGG CACGGCGACG GCGTGGTCAC 
4 741 CGGGCCCGGA CGGGCTGCCG GTGAGGGGGG CGACGGCCAC ACCGAGGCCG GCGGCGACCA 
4 801 GGGCCCGCAG CGTGCTCAGC TCGGTGCTCT CCAGGACGAC CCGCGGCACG AATCCGGCCG 
4 861 CGGCGCACAG CCGGTCGGTG ATCTGGCGCA GTCCGAAGAC CGGCTCCAGT GCCACGAACG 
4 921 CCTCATCGGC CAGCTCCGCG GTCCGCACCC GGCGGCGTCT GGCCAGCCGG TGTCCGGGTG 
4 981 GGACGAGCAG GCACAGTGCC TCGTCCCGCA GTGGTGTCCA CTCCACATCG TCCCCGGCGG 
5041 GTCGTGGGCT GGTCAGCCCC AGGTCCAGCC TGCTGTTGCG GACGTCGTCG ACCACGGCGT 
5101 CGGCGGCGTC GCCGCGCAGT TCGAAGGTGG TGCCGGGAGC CAG CCGGCGG TACCCGGCGA 
5161 GGAGGTCGGG CACCAGCCAG GTGCCGTAGG AGTGCAGGAA ACCCAGTGCC ACGGTGCCGG 
5221 TGTCGGGGTC GATCAGGGCG GTGATGCGCT GCTCGGCGCC GGAGACCTCA CTGATCGCGC 
25 5281 GCAGGGCGTG GGCGCGGAAG ACCTCGCCGT ACTTGTTGAG CCGGAGCCGG TTCTGGTGCC 
5341 GGTCGAACAG CGGCACGCCC ACTCGTCGCT CCAGCCGCCG GATGGCCCTG GACAGGGTCG 
54 01 GCTGGGAGAT GTTGAGCCGT TCCGCGGTGA TCGTCACGTG CTCGTGCTCG GCCAAGGCCG 
54 61 TGAACCACTG CAACTCCCGT ATCTCCATGC AGGGACTATA CGTACCGGGC ATGGTCCTGG 
5521 CGAGGTTTCG TCATTTCACA GCGGCCGGGC GGCGGCCCAC AGTGAGTCCT CACCAACCAG 
30 5581 GACCCCATGG GAGGGACCCC ATGTCCGAGC CGCATCCTCG CCCTGAACAG GAACGCCCCG 
5641 CCGGGGCCCT GTCCGGTCTG CTCGTGGTTT CTTTGGAGCA GGCCGTCGCC GCTCCGTTCG 
5701 CCACCCGCCA CCTGGCGGAC CTGGGCGCCC GTGTCATCAA GATCGAACGC CCCGGCAGCG 
57 61 GCGACCTCGC CCGCGGCTAC GACCGCACGG TGCGTGGCAT GTCCAGCCAC TTCGTCTGGC 
5821 TGAACCGGGG GAAGGAGAGC GTCCAGCTCG ATGTGCGCTC GCCGGAGGGC AACCGGCACC 
35 5881 TGCACGCCTT GGTGGACCGG GCCGATGTCC TGGTGCAGAA TCTGGCACCC GGCGCCGCGG 
5941 GCCGCCTGGC ATCGGCCACC AGGTCCTCGC GCGGAGCCAC CGAGGCTGAT CACCTGCGGA 
6001 CATATCCGGC TACGGCAGTA CCGGCTGCTA CCGCGGACCG CAAGGCGTAC GACCTCCTGG 
6061 TCCAGTGCGA AGCGGGGCTG GTCTCCATCA CCGGCACCCC CGAGACCCCG TCCAAGGTGG 
6121 GCCTGTCCAT CGCGGACATC TGTGCGGGGA TGTACGCGTA CTCCGGCATC CTCACGGCCC 
40 6181 TGCTGAAGCG GGCCCGCACC GGCCGGGGCT CGCAGTTGGA GGTCTCGATG CTCGAAGCCC 
62 41 TCGGTGAATG GATGGGATAC GCCGAGTACT ACACGCGC T A CGGCGGCACC GCTCCGGCCC 
6301 GCGCCGGCGC CAGCCACGCG ACGATCGCCC CCTACGGCCC GTTCACCACG CGCGACGGGC 
6361 AGACGATCAA TCTCGGGCTC CAGAACGAGC GGGAGTGGGC TTCCTTCTGC GGTGTCGTGC 
64 21 TACAACGCCC CGGTCTCTGC GACGACCCGC GCTTTTCCGG CAACGCCGAC CGGGTGGCGC 
45 64 81 ACCGCACCGA GCTCGACGCC CTGGTGAGCG AGGTGACGGG CACGCTCACC GGCGAGGAAC 
6541 TGGTGGCGCG GCTGGAGGAG GCGTCGATCG CCTACGCACG CCAGCGCACC GTGCGGGAGT 
6601 TCAGCGAACA CCCCCAACTG CGTGACCGTG GACGCTGGGC TCCGTTCGAC AGCCCGGTCG 
6661 GTGCGCTGGA GGGCCTGATC CCCCCGGTCA CCTTCCACGG CGAGCACCCG CGGCGGCTGG 
67 21 GCCGGGTCCC GGAGCTGGGC GAGCATACCG AGTCCGTCCT GGCGTGGCTG GCCGCGCCCC 
50 6781 ACAGCGCCGA CCGCGAAGAG GCCGGCCATG CCGAATGAAC TCACCGGAGT CCTGATCCTG 
6841 GCCGCCGTGT TCCTGCTCGC CGGCGTACGG GGGCTGAACA TGGGCCTGCT CGCGCTGGTC 
6901 GCCACCTTTC TGCTCGGGGT GGTCGCACTC GACCGAACGC CGGACGAGGT GCTGGCGGGT 
6961 TTCCCCGCGA GCATGTTCCT GGTGCTGGTC GCCGTCACGT TCCTCTTCGG GATCGCCCGC 
7021 GTCAACGGCA CGGTGGACTG GCTGGTACGT GTCGCGGTGC GGGCGGTGGG GGCCCGGGTG 
55 7081 GGAGCCGTCC CCTGGGTGCT CTTCGGCCTG GCGGCACTGC TCTGCGCGAC AGGCGCGGCC 
7141 TCGCCCGCGG CGGTGGCGAT CGTGGCGCCG ATGAGCGTCG CGTTCGCCGT CAGGCACCGC 
7201 ATCGATCCGC TGTACGCCGG ACTGATGGCG GTGAACGGGG CCGCAGCCGG CAGTTTCGCC 
7261 CCCTCCGGGA TCCTGGGCGG CATCGTCCAC TCGGCGCTGG AGAAGAACCA TCTGCCCGTC 
7321 AGCGGCGGGC TGCTCTTCGC AGGCACCTTC GCCTTCAACC TGGCGGTCGC CGCGGTGTCA 
60 7 381 TGGCTCGTCC TCGGGCGCAG GCGCCTCGAA CCACATGACC TGGACGAGGA CACCGATCCC 
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74 41 ACGGAAGGGG ACCCGGCTTC CCGCCCCGGC GCGGAACACG TGATGACGCT GACCGCGATG 
7501 GCCGCGCTGG TGCTGGGAAC CACGGTCCTC TCCCTGGACA CCGGCTTCCT GGCCCTCACC 
7 5 61 TTGGCGGCGT TGCTGGCGCT GCTCTTCCCG CGCACCTCCC AGCAGGCCAC CAAGGAGATC 
7 621 GCCTGGCCCG TGGTGCTGCT GGTATGCGGG ATCGTGACCT ACGTCGCCCT GCTCCAGGAG 
7 681 CTGGGCATCG TGGACTCCCT GGGGAAGATG ATCGCGGCGA TCGGCACCCC GCTGCTGGCC 
77 41 GCCCTGGTGA TCTGCTACGT GGGCGGTGTC GTCTCGGCCT TCGCCTCGAC CACCGGGATC 
7 8 01 CTCGGTGCCC TGATGCCGCT GTCCGAGCCG TTCCTGAAGT CCGGTGCCAT CGGGACGACC 
7 8 61 GGCATGGTGA TGGCCCTGGC GGCCGCGGCG ACCGTGGTGG ACGCGAGTCC CTTCTCCACC 
7 921 AATGGTGC7C TGGTGGTGGC CAACGCTCCC GAGCGGCTGC GGCCCGGCGT GTACCAGGGG 

7 981 .7GCTGTGGT GGGGCGCCGG GGTGTGCGCA CTGGCTCCCG CGGCCGCCTG GGCGGCCTTC 
8041 GTGGTGGCGT GAGCGCAGCG GAGCGGGAAT CCCCTGGAGC CCGTTTCCCG TGCTGTGTCG 
8101 CTGACGTAGC GTCAAGTCCA CGTGCCGGGC GGGCAGTACG CCTAGCATGT CGGGCATGGC 
8161 TAATCAGATA ACCCTGTCCG ACACGCTGCT CGCTTACGTA CGGAAGGTGT CCCTGCGCGA 
8221 TGACGAGGTG CTGAGCCGGC TGCGCGCGCA GACGGCCGAG CTGCCGGGCG GTGGCGTACT 
8281 GCCGGTGCAG GCCGAGGAGG GACAGTTCCT CGAGTTCCTG GTGCGGTTGA CCGGCGCGCG 

8 341 TCAGGTGCTG GAGATCGGGA CGTACACCGG CTACAGCACG CTCTGCCTGG CCCGCGGATT 
84 01 GGCGCCCGGG GGCCGTGTGG TGACGTGCGA TGTCATGCCG AAGTGGCCCG AGGTGGGCGA 
8 4 61 GCGGTACTGG GAGGAGGCCG GGGTTGCCGA CCGGATCGAC GTCCGGATCG GCGACGCCCG 
8521 GACCGTCCTC ACCGGGCTGC TCGACGAGGC GGGCGCGGGG CCGGAGTCGT TCGACATGGT 
8581 GTTCATCGAC GCCGACAAGG CCGGCTACCC CGCCTACTAC GAGGCGGCGC TGCCGCTGGT 
8 641 ACGCCGCGGC GGGCTGATCG TCGTCGACAA CACGCTGTTC TTCGGCCGGG TGGCCGACGA 
87 01 AGCGGTGCAG GACCCGGACA CGGTCGCGGT ACGCGAACTC AACGCGGCAC TGCGCGACGA 
8761 CGACCGGGTG GACCTGGCGA TGCTGACGAC GGCCGACGGC GTCACCCTGC TGCGGAAACG 
8821 GTGACCGGGG CGATGTCGGC GGCGGTCAGC GTCAGCGTCG TCGGCGCGGG CCTCGCGGAG 
8 881 GGCTCCAGAT GCAGGCGTTC GACGCCGGCG GCGGAAGCGC CCGCCACCTC GGACACGCAG 
8941 GGGCAGTCGG AGTCCGCGAA GCCCGCGAAC CGGTAGGCGA TCTCCATCAT GCGGTTGCGG 
9001 TCCGTACGCC GGAAGTCCGC CACCAGGTGC GCCCCCGCGC GGGCGCCCTG GTCCGTGAGC 
9061 CAGTTCAGGA TCGTCGCACC GGCACCGAAC GACACGACCC GGCAGGACGT GGCGAGCAGT 
9121 TTCAGGTGCC ACGTCGACGG CTTCTTCTCC AGCAGGATGA TGCCGACGGC GCCGTGCGGG 
9181 CCGAAGCGGT CGCCCATGGT GACGACGAGG ACCTCATGGG CGGGATCGGT GAGCACGCGC 
9241 GCAGGTCGGC GTCGGAGTAG TGCACGCCGG TCGCGTTCAT CTGGCTGGTC CGCAGCGTCA 
9301 GTTCCTCGAC GCGGCTGAGT TCCTCCTCCC CCGCGGGTGC GATCGTCATG GAGAGGTCGA 
9361 GCGAGCGCAG GAAGTCCTCG TCGGGACCGG AGTACGCCTC CCGGGCCTGG TCGCGCGCGA 
94 21 AACCCGCCTG GTACATCAGG CGGCGCCGAC GCGAGTCGAC CGTGGACACC GGCGGGCTGA 
94 81 ACTCCGGCAG CGACAGGAGC GTGGCCGCCT GCTCGGCCGG GTAGCACCGC ACCTCGGGCA 
9541 GGTGGAACGC CACCTCGGCA CGCTCGGCGG GCTGGTCGTC GATGAACGCG ATCGTGGTCG 
9601 GTGCGAAGTT CAGCTCCGTG GCGATCTCGC GGACGGACTG CGACTTCGGC CCCCATCCGA 
9661 TGCGGGCCAG CACGAAGTAC TCCGCCACAC CGAGGCGTTC CAGACGCTCC CACGCGAGGT 
9721 CGTGGTCGTT CTTGCTCGCC ACCGCCTGGA GGATGCCGCG GTCGTCGAGC GTGGTGATCA 
9781 CCTCGCGGAT CTCGTCGGTG AGGACCACCT CGTCGTCCTC CAGCACGGTG CCCCGCCACA 
9841 AGGTGTTGTC CAGGTCCCAG ACCAGACACT TGACAATGGT CATGGCTGTC. CTCTCAAGCC 
9901 GGGAGCGCCA GCGCGTGCTG GGCCAGCATC ACCCGGCACA TCTCGCTGCT GCCCTCGATG 
9961 ATCTCCATGA GCTTGGCGTC GCGGTACGCC CGTTCGACGA CGTGTCCCTC TCTCGCGCCT 

10021 GCCGACGCGA GCACCTGTGC GGCGGTCGCG GCCCCGGCGG CGGCTCGTTC GGCGGCGACG 
10081 TGCTTGGCCA GGATCGTCGC GGGCACCATC TCGGGCGAGC CCTCGTCCCA GTGGTCGCTG 
10141 GCGTACTCGC ACACGCGGGC CGCGATCTGC TCCGCGGTCC ACAGGTCGGC GATGTGCCCG 
10201 GCGACGAGTT GGTGGTCGCC GAGCGGCCGG CCGAACTGCT CCCGGGTCCG GGCGTGGGCC 
10261 ACCGCGGCGG TGCGGCAGGC CCGCAGGATC CCGACGCAGC CCCAGGCGAC CGACTTGCGC 
10321 CCGTAGGCGA GTGACGCCGC GACCAGCATC GGCAGTGACG CGCCGGAGCC GGCCAGGACC 
10381 GCGCCGGCCG GCACACGCAC CTGGTCCAGG TGCAGATCGG CGTGGCCGGC GGCGCGGCAG 
104 41 CCGGACGGCT TCGGGACGCG CTCGACGCGT ACGCCGGGGG TGTCGGCGGG CACGACCACC 
10501 ACCGCACCGG AACCATCCTC CTGGAGACCG AAGACGACCA GGTGGTCCGC GTAGGCGGCG 
10561 GCAGTCGTCC AGACCTTGTG GCCGTCGACG ACAGCGGTGT CCCCGTCGAG CCGAACCCGC 
10621 GTCCGCATCG CCGACAGATC GCTGCCCGCC TGCCGCTCAC TGAAGCCGAC GGCCGCGAGT 
10681 TTCCCGCTGG TCAGCTCCTT CAGGAAGGTC GCCCGCTGAC CGGCGTCGCC GAGCCGCTGC * 
10741 ACGGTCCACG CGGCCATGCC CTGCGACGTC ATGACACTGC GCAGCGAACT GCAGAGGCTG 
10801 CCGACGTGTG CGGTGAACTC GCCGTTCTCC CGGCTGCCGA GTCCCAGACC GCCGTGCTCG 
10861 GCCGCCACTT CCGCGCAGAG CAGGCCGTCG GCGCCGAGCC GGACGAGCAG GTCGCGCGGC 
10 921 AGTTCGCCGG ACGTGTCCCA CT-CGGCGGCC CGGTCACCGA CAAGGTCGGT CAGCAGCGCG 
10981 TCACGCTCAG GCATCGACGG CCCGCAGCCG GTGGACGAGT GCGACCATGG ACTCGACGGT 
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1104 1 ACGGAAGTTC GCGAGCTGGA GGTCCGGGCC GGCGATCGTG ACGTCGAACG TCTTCTCCAG 
11101 GTACACGACC AGTTCCATCG CGAACAGCGA CGTGAGGCCG CCCTCCGCGA ACAGGTCGCG 
11161 GTCCACGGGC CAGTCCGACC TGGTC7TCGT CTTGAGGAAC GCGACCAACG CGTGCGCGAC 
11221 GGGGTCGTCC TTGACGGGTG CGG7CA7GAG AACACCTTCT CGTATTCGTA GAAGCCCCGG 
11281 CCGGTCTTCC GGCCGTGGTG 7CCC7CGCGG ACCTTGCCCA GCAGCAGGTC ACAGGGGCGG 
11341 C7GCGCTCG7 CGCCGGTGCG TTTGTGCAGC ACCCACAGCG CGTCGACGAG GTTGTCGATG 
114 01 CCGATCAGGT CCGCGGTGCG CAGCGGCCCG GTCGGATGGC CGAGGCACCC CGTCATGAGC 
114 61 GCGTCGACGT CCTCGACGGA CGCGG7GCCC TCCTGCACGA TCCGCGCCGC GTCGTTGATC 
11521 ATCGGGTGGA GCAGCCGGCT CG7GACGAAG CCGGGCGCGT CCCGGACGAC GATCGGC TTG 
11581 CGCCGCAGCG CCGCGAGCAG GTCCCZGGZG GCGGCCATGG CCTTCTCACC GGTCCGGGGT 
11641 CCGCGGATCA CCTCGACCGT CGGGATCASS TACGACGGGT TCATGAAGTG CGTGCCGAGC 
11701 AGGTCCTCGG GCCGGGCCAC GGAG7CGSCC AGTTCGTCAA CCGGGATCGA CGACGTGTTC 
117 61 GTGATGACCG GGATACCGGG CGCCGCTGCC GAGACCGTGG CGAGTACCTC CGCCTTGACC 
11821 TCGGCGTCCT CGACGACGGC CTCGATCACC GCGGTGGCCG TACCGATCGC GGGCAGCGCG 
11881 GACGTGGCCG TCCGCAGCAC ACCGGGGTCG GCCTCGGCGG GCCCGGCCAC GAGTTGTGCC 
11941 GTCCGCAGTT CGGTGGCGAT CCGCGCCCGC GCCGCCGTAA GGATCTCCTC GGACGTGTCG 
12001 ACGAGTGTCA CCGGGACGCC G7GGCGCAGC GCGAGCGTGG TGATGCCGGT GCCCATCACT 
12061 CCCGCGCCGA GCACGATCAG CTGGTGGTCC ACGCTGTTTC CTCCCTCCGG GGTCACCATG 
12121 GCAGCGAGTA CGGGTCGAGG ACGTCTTCCG GGGTCGACCC GATCGCGTCC TTGCGGCCGA 
12181 GGCCGAGTTC GTCGGCGAAG CCGAGCAGCA CGTCGAACGC GATGTGGTCG GCGAACGCGC 

122 41 TGCCCGTCGA GTCGAGGACG CTCAGGCTGT CCCGGTGGTC CGCCGCGGTG TCCGGTGCCG 
12301 CGCACAGGGC CGCCAGCGAC GGGCCGAGCT CGCGGTCCGG CAGTTGCTGG TACTCGCCCT 

123 61 CGGCGCGGGC CTGCCCCGGA TGGTCGACGC AGATGAACGC GTCGTCGAGC AGGGTCTTCG 

124 21 GCAGTTCGGT CTTGCCCGGC TCGTCGGCGC CGATGGCGTT CACATGCAGG TGCGGCAGCC 
124 81 GCGGCTCGGC GGGCAGCACC GGCCCTTTGC CCGAGGGCAC CGAGGTGACG GTGGACAGGA 
12541 CATCCGCGGC GGCGGCGGCC TCCGCCGGAT CGGTCACCTT GACCGGCAGT CCGAGGAACG 
12601 CGATGCGGTC CGCGAACGAC GCCGCGTGGC CGGGGTCGGT GTCGCTGACC AGGATCCGCT 
12 661 CGATGGGCAG GACCCTGCTG AGCGCGTGCG CCTGGGTCAC CGCCTGTGCG CCCGCGCCGA 
127 21 TCAGCGTGAG CGTGGCGCTG TCGGACCGGG CCAGCAGCCG GCTCGCGACG GCGGCGACCG 

127 81 CGCCGGTCCG CATCGCGGTG ATCACGCCTG CGTCGGCGAG GGCGGTCAGA CTGCCGCTGT 

128 41 CGTCGTCGAG GCGCGACATC GTGCCGACGA TCGTCGGCAG CCGGAAGCGC GGATAGTTGT 
12 901 GCGGACTGTA CGAAACCGTC TTCATGGTCA CGCCGACACC GGGGACCCGG TACGGCATGA 

12 961 ACTCGATGAC GCCGGGAATG TCGCCGCCGC GGACGAATCC GGTACGCGGC GGCGCCTCGG 
13021 CGAACTCGCC GCGGCCGAGC GCGGCGAACC CGTCGTGCAG CTCGCTGATC AGCCGGTCCA 
13081 TCATCACGTC GCGGCCGATC ACGGAGAGAA TCCGCTTGAT GTCACGTTGG CGCAGGACCC 
13141 TGGTCTGCAT GTGTCACCTC CCTTTCGTGG CCGGAGCTGT CTTGGTGGTG CCGCTCGGGG 
13201 CGGCTTCCGT TCTCATCGCA GCTCCCTGTC GATGAGGTCG AAAATCTCGT CCGCGGTCGC 
132 61 GTCCGCGGAC AGCACGCCGG CCGGCGTGGT CGGGCGGGTC TCCCGCCGCC AGCGGTTGAG 

13 321 CAGGGCGTCC AGCCGGGTTC CGATCGCGTC CGCCTGGCGG GCGCCCGGGT CGACACCGGC 
13381 AACGAGTGCT TCCAGCCGGT CGAGC7GCGC GAGCACCACG GTCACCGGGT CGTCCGGGGA 
134 41 CAGCAGTTCA CCGATGCGGT CGGCGAGTGC GCGCGGCGAC GGGTAGTCGA AGACGAGCGT 
13501 GGCGGACAGT CGCAGACCGG TCGCCTCGTT GAGGCCGTTG CGCAGCTGCA CCGCGATGAG 
13561 CGAGTCCACA CCGAGTTCCC GGAACGCCGC GTCCTCCGGG ATGTCCTCCG GGTCGGCGTG 
13 621 GCCCAGGACG GCCGCTGCCT TCTGCCGGAC GAGGGCGAGC AGGTCGGTGG GGCGTTCCTG 
13 681 CTCGTTGCGG GCGCTCCGGC GGGCCGACGG CTTGGGCCGG CCACGCAGCA GCGGGAGGTC 
13741 CGGCGGCAGG TCGCCCGCCA CGGCGACGAC ACTGCCCGTT CCGGTGTGGA CGGCGGCGTC 
13801 GTACATGCGC ATGCCCTGTT CGGCGGTGAG CGCGCTCGCC CCACCCTTGC GCATACGGCG 
138 61 CCGGTCGGCG TCGGTCAGGT CCGCGGTCAG GCCACTCGCC TGGTCCCACA GCCCCCACGC 
13921 GATCGACAGC CCTGGCAGCC CTTGTGCACG CCGGTGTTCG GCGAGCGCGT CGAGGAACGC 

13 981 GTTCGCCGCC GCGTAGTTGC CCTGACCGGG GGTGCCCAGC ACACCGGCCG CCGACGAGTA 

14 041 GACGACGAAT GCGGCGAGGT CGGTG7CGCG GGTGAGCCGG TGCAGGTGCC AGGCGGCGTC 
14101 GGCCTTGGGT TTGAGGACGG TGTCGATGCG GTCGGGGGTG AGGTTGTCGA GCAGGGCGTC 
14161 GTCGAGGGTT CCGGCGGTGT GGAAGACGGC GGTGAGGGGT TGAGGGATGT GGGCGAGGGT 
14 221 GGTGGCGAGT TGGTGGGGGT CGCCGACGTC GCAGGGGAGG TGGGTGCCGG GGGTGGTGTC 
14 281 GGGGGGTGGG GTGCGGGAGA GGAGGTAGGT GTGGGGGTGG TTCAGGTGGC GGGCGAGGAT 
14 341 GCCGGCGAGG GTGCCGGAGC CGCCGGTGAT GACGACGGCC CCCTCGGGGT CCAGCGGCCG 
14 4 01 CGGGACCGTG AGGACGATCT 7GCCGG7G7G CTCGCCGCGG CTCATGGTCG CCAGCGCCTC 
14 4 61 GCGGACCTGC CGCATGTCGT GCACCGTCAC CGGCAGCGGG TGCAGCACAC CGCGCGCGAA 
14 521 CAGGCCGAGC AGCTCCGCGA TGATCTCCT7 GAGCCGGTCG GGCCCCGCGT CCATCAGGTC 
14 581 GAAGGGTCGC TGGACGGCGT GCCGGATG TC CGTCTTCCCC ATCTCGATGA ACCGGCCACC 
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14 641 CGGCGCGAGC AGGCCGACGG ACGCGTCGAG GAGTTCACCG GTGAGCGAGT TGAGCACGAC 
14 701 G7CGACCGGC GGGAACGCGT CGGCGAACGC GGTGCTGCGG GAATCGGCCA GATGCGCTCC 
14 761 GTCCAGGTCC ACCAGATGGC GCTTCGCGGC GCTGGTGGTC GCGTACACCT CCGCGCCCAG 
14 821 GTGCCGCGCG ATCTGCCGGG CGGCGGAACC GACACCGCCG GTGGCCGCGT GGATCAGGAC 
14 881 CTTCTCGCCG GGGCGCAGCC CGGCGAGGTC GACCAGGCCG TACCACGCGG TCGCGAACGC 
14 941 GGTCATCACG GACGCCGCCT GCGGGAACGT CCAGCCGTCC GGCATCCGGC CGAGCATCCG 
15001 GTGGTCGGCG ATGACCGTGG GGCCGAAGCC GGTGCCGACG AGGCCGAAGA CGCGGTCGCC 
15061 CGGTGCCAGA CCGGAGACGT CGGCGCCGGT CTCCAGGACG ATGCCCGCGG CCTCGCCGGC 
15121 GAGCACGCCC TGACCGGGGT AGGTGCCGAG CGCGATCAGC ACATCGCGGA AGTTGAGGCC 
15181 CGCCGCACGC ACACCGATCC GGACCTCGGC CGGGGCGAGG GGGCGCCGGG GCTCCGCCGA 
15241 GTCGGCCGCG GTGAGGCCGT CGAGGGTGCC CGTCCGCGCC GGCCGGATCA GCCACGTGTC 
15301 GCTGTCCGGC ACGGTGAGCG GCTCCGGCAC CCGGGTGAGG CGGGCCGCCT CGAACCGGCC 
15361 GCCGCGCAGC CGCAGACGCG GCTCGCCGAG TGCGACGGCG ATGCGCTGCT GCTCGGGGGC 
154 21 GAGCGTGACG CCGGACTCGG TCTCGACGTG GACGAACCGG CCGGGCTGCT CGGCCTGGGC 
15481 GGCGCGCAGC AGTCCGGCCG CCGCGCCGGT GGCGAGGCCC GCGGTGGTGT GCACGAGCAG 
15541 ATCCCCGCCG GAGCCGGTCA GGGCGGTCAG CAGCCGGGTG GTGAGCGCAC GCGTCTCGGC 
15601 CACCGGGTCG TCGCCATCAG CGGCAGGCAA CGTGATGACG TCCACGTCGG TCGCGGGGAC 
15661 ATCCGTGGGT GCGGCGACCT CGATCCAGGT GAGACGCATC AGGCCGGTGC CGACGGGTGG 
15721 GGACAGCGGG CGGGTGCGGA CCGTCCGGAT CTCGGCGACG AGTTGGCCGG CGGAGTCGGC 
15781 GACGCGCAGA CTCAGCTCGT CGCCGTCACG AGTGATCACG GCTCGGAGCA TGGCCGAGCC 
15841 CGTGGCGACG AACCGGGCCC CCTTCCAGGC GAACGGCAGA CCCGCAGCGC TGTCGTCCGG 
15901 CGTGGTGAGG GCGACGGCGT GCAGGGCCGC GTCGAGCAGC GCCGGATGCA CACCGAAACC 
15961 GTCCGCCTCG GCGGCCTGCT CGTCGGGCAG CGCCACCTCG GCATACACGG TGTCACCATC 
16021 ACGCCAGGCA GCCCGCAACC CCTGGAACGC CGACCCGTAC TCATAACCGG CATCCCGCAG 
16081 TTCGTCATAG AACCCCGAGA CGTCGACGGC CACGGCCGTG ACCGGCGGCC ACTGCGAGAA 
16141 CGGCTCCACA CCGACAACAC CGGGGGTGTC GGGGGTGTCG GGGGTCAGGG TGCCGCTGGC 
16201 GTGCCGGGTC CAGCTGCCCG TGCCCTCGGT ACGCGCGTGG ACGGTCACCG GCCGCCGTCC 
16261 GGCCTCATCA GCCCCTTCCA CGGTCACCGA CACATCCACC GCTGCGGTCA CCGGCACCAC 
16321 AAGGGGGGAT TCGATGACCA GCTCGTCCAC TATCCCGCAA CCGGTCTCGT CACCGGCCCG 
16381 GATGACCAGC TCCACAAACG CCGTACCCGG CAGCAGGACC GTGCCCCGCA CCGCGTGATC 
16441 AGCCAGCCAG GGGTGAGTGC GCAATGAGAT CCGGCCAGTG AGAACAACAC CACCATCGTC 
16501 GGCGGGCAGC GCTGTGACAG CGGCCAGCAT CGGATGCGCC GCACCCGTCA ACCCCGCCGC 
165 61 C G AC AG AT C G GTGGCACCGG CCGCCTCCAG CCAGTACCGC CTGTGCTCGA ACGCGTACGT 
16621 GGGCAGATCC AGCAGCCGTC CCGGCACCGG TTCGACCACC GTGTCCCAGT CCACTGCCGT 
16681 GCCCAGGGTC CACGCCTGCG CCAACGCCGT CAGCCACCGC TCCCAGCCGC CGTCACCGGT 
16741 CCGCAACGAC GCCACCGTGT GAGCCTGCTC CATCGCCGGC AGCAGCACCG GATGGGCACT 
16801 GCACTCCACG AACACCGACC CATCCAGCTC CGCCACCGCC GCGTCCAACG CCACCGGACG 
168 61 ACGCAGATTC CGGTACCAGT ACCCCTCATC CACCGGCTCC GTCACCCAGG CGCTGTCCAC 
16921 GGTCGACCAC CACGCCACCG ACGCGGCCTT CCCTGCCACC CCCTCCAGTA CCTTGGCCAG 
16981 TTCATCCTCG ATGGCTTCCA CGTGGGGCGT GTGGGAGGCG TAGTCGACCG CGATACGACG 
17 041 CACCCGCACG CCTTCGGCCT CATACCGCGC CACCACCTCC TCCACCGCCG ACGGGTCCCC 
17101 CGCCACCACC GTCGAAGCCG GGCCGTTACG CGCCGCGATC CACACACCCT CGACCAGACC 
17161 GACCTCACCG GCCGGCAACG CCACCGAAGC CATCGCTCCC CGCCCGGCCA GTCGCGCCGC 
17221 GATGACCTGA CTGCGCAATG CCACCACGCG GGCGGCGTCC TCGAGGCTGA GGGCTCCGGC 
17281 CACGCACGCC GCCGCGATCT CGCCCTGGGA GTGTCCGATC ACCGCGTCCG GCACGACCCC 
17341. ATGCGCCTGC CACAGCGCGG CCAGGCTCAC CGCGACCGCC CAGCTGGCCG GCTGGACCAC 
174 01 CTCCACCCGC TCCGCCACAT CCGGCCGCGC CAACATCTCC CGCACATCCC AGCCCGTGTG 
i74 61 CGGCAGCAAC GCCTGAGCGC ACTCCTCCAT ACGCGCGGCG AACACCGCGG AGTGGGCCAT 
17 521 GAGTTCCACG CCCATGCCGA CCCACTGGGC GCCCTGGCCG GGGAAGACGA ACACCGTACG 
17581 CGGCTGGTCC ACCGCCACAC CCGTCACCCG GGCATCGCCC AGCAGCACCG CAOGGTGACC 
17 641 GAAGACAGCA CGCTCCCGCA CCAACCCCTG CGCGACCGCG GCCACATCCA CACCACCCCC 
17701 GCGCAGATAC CCCTCCAGCC GCTCCACCTG CCCCCGCAGA CTCACCTCAC CACGAGCCGA 
177 61 CACCGGCAAC GGCACCAACC CGTCAACAAC CGACTCCCCA CGCGACGGCC CAGGAACACC 
17821 CTCAAGGATC ACGTGCGCGT TCGTACCGCT CACCCCGAAC GACGACACAC CCGCATGCGG 
17881 TGCCCGATCC GACTCGGGCC ACGGCCTCGC CTCGGTGAGC AGCTCCACCG CACCGGCCGA " "' 
17 941 CCAGTCCACA TGCGACGACG GCTCGTCCAC ATGCAGCGTC TTCGGCGCGA TCCCGTACCG 
18001 CATCGCCATG ACCATCTTGA TCACACCGGC GACACCCGCC GCCGCCTGCG CATGACCGAT 
18061 GTTCGACTTC AACGAACCCA GCAGCAGCGG AACCTCACGC TCCTGCCCGT ACGTCGCCAG 
18121 AATGGCCTGC GCCTCGATGG GATCGCCCAG CGTCGTCCCC GTCCCGTGCG CCTCCACCAC 
18181 GTCCACATCG GCGGCGCGCA GTCCGGCGTT CACCAACGCC TGCTGGATGA CACGCTGCTG 
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18241 .GGACGGGCCG TTGGGGGCGG ACAGCCCGTT 
18 301 GCGGACGACC GCGAGAACGG TGTGTCCGTT 
18 361 AAGAACGCCG GCGCCCTCCG CCCAGCCGGT 
184 21 GCGGCCGZCG GGGGAGAGTC CGCCCTGCTG 
184 81 CATGACGGTG ACACCGCCGA CCAGCGCCAG 
18541 GGCCTGGTGC AGCGCGACCA GCGACGACGA 
18 601 CTGGAGCCCA TAGAAGTACG AGATCCGGCC 
18 661 GCCGAACCCG TCCAGGTCCG CGCCGACGCC 
187 21 GCCGGTGTCG CTGCCGCGCA GTGTGCCCGG 
15 781 TGTCGT77CC AGCAGGATCC GCTGCTGGGG 
18841 GCCGAAGAAC GCGGCATCGA AGCCGGCGGC 
18 901 CGATCCGCCG GTGAGGCCGG ACGGGTCCCA 
18961 GTCGCCGCCA CTGTCCACCA TGCGCCACAG 
19021 TCGGCAGGCC ATGCCCACGA TGGCCAGCGG 
19081 AGCGACCGGT GCGGCACCAC CGACCAGAGC 
19141 CGTCGGGTAG TCGAAGACAA GCGTGGCGGG 
19201 GTTCCGCAGT TCGACGGCGG TCAGCGAGTC 
19261 GGACACGTCC GCGGCGTCCG CGTGGCCGAG 
19321 CAGCAGCGCG GTGTCCCGCT CAGCGCCGGA 
19381 GGCGGTGGCC GCCGCCGGGC GCGATACGGC 
19441 GTGCGCGGTG AGGTCCATCG TGGCCGCCAC 
19501 TTCCAGCAGG CGCATGCCCA CACCGGCCGA 
19561 GGTGCGGTTG GTGCCGCTCA TGCTGCCGGT 
19621 GGCCAGCGAC AGCGCGGGCA GTCCTTCGGC 
19681 CCCGTTCGCC GCCGAGTAGT TGCCCTGGCC 
19741 GTAGAGGACG AACGAGCGCA GGTCCGCGTC 
19801 GTCGGCTTTG GGGCGCAGTG TGGTGGCGAG 
19861 GTCGTCGAGC ACGGCTGCCG TGTGGAAGAC 
19921 CGCGGCGGCG AGCTGGTCCC GGTCGGCGAC 
19981 CGCCGGCGGT TCGCTGCGCG AGAGCAACAG 
2 0041 ATGCCGGGCG AGGAGACCTG CCAGCACACC 
20101 CGGGTCGAGC AGCGGTTCGG GCGTTTCCGC 
20161 GTACCGGCCG TCGGTGACGC GGACGTACGG 
20221 CTCGATGGGG GTGTCGGTGC CGGTCTCCAC 
20281 GGCGGACC3G ACGAGGCCGG CGACCGCTCC 
20341 GAGGGTGGTC TCCGCAGGGC CGTCCTCGGC 
204 01 CTCGGTGAGC CGGTACGTCT CGTCGAGGAC 
20461 GATGTGGACG GCGTCCGCAG GACCGGGCCC 
20521 GTACAAGGAG TTCCGTACGA CGGCGGCGTC 
20581 CGCGGCGACG GTCACCACCG GTTGGCCGAC 
20641 CGGGCCCTGA GTGATCGTGA CGCGCAGCGT 
20701 GCTCCACGAG AACGGCAGCC GCACCTCCGC 
207 61 GACGTGCAAG GCCGCGTCGA ACAGCGCCGG 
20821 CTGTTCCCCG GCGATCTCCA CCTCGGCGTA 
20881 CAGTCCCTGG AACGCTGGGC CGTAGCTGTA 
20941 GCTCACGTCG ACGCGTCGCG CGCCCGGCGG 
21001 GCTTCCGGCC CGGCCGAGGG TGCCGCTGGC 
210 61 ACGCGCGTGG ACGGTCACTC GCCGCCGTCC 
21121 CACATCCACC GCGCCGGTCA CCGGCACCAC 
21181 CACCCCGCAA CCGGTCTCGT CACCGGCCCG 
21241 CAGCAGAACC GTGCCGCGCA CCGCGTGATC 
21301 CCGGCCAGTG AGAACAACAC CACCACCGTC 
21361 CATCGGA7GC GCCGCCCCGG TCAGCCCGGC 
214 21 CAGCCAGTAC CGCCTGTGCT CGAACGCGTA 
21481 CGGTTCGACC ACCGTGTCCC AGTCCACTGC 
21541 CGTCAGCCAC CGCTCCCAGC CGCCGTCACC 
21601 TTCCATCSCC GGCAGCAGCA CCGGATGGGC 
21661 CTCCGCCACC GCCGCGTCCA GCGCGACGGG 
21721 ATCCACCGGC TCGGTCACCC AGGCGCTGTC 
21781 CCCGCCGGAA ATCCCCTCCA GTACCTCGGC 



GGAGGCACCG TCCTGGTTCA CCGCGGACCC 
GCGCTCGGCG TCGGAGAGCC GCTCCAGCAC 
GCCGTTGGCG GCGTCCGCGA ACGCGCGGCA 
CTGGAATTCC ACGAACCCGG TCGGGGTCGC 
CGAGCACTCC CCGTGGCGCA GTGCGTGCCC 
GCACGCCGTG TCCACCGTGA ACGCCGGTCG 
GGTGAGCACG CTGGGCTGCA TGCCGATCGA 
GTACCCGTAC GAGAAGGCGC CCATGAACAC 
CACGATGCCC GCGCTCTCGA ACGCCTCCCA 
GTCCATGGCC CGTGCCTCAC GGGGGCTGAT 
GTCGGAGAGG AAGCCGCCGC GGTCCGTGTC 
GCCACGGTCG GCCGGGAAGC CGGTGACCGC 
GTCGTCGGGC GAGGTGACGC CGCCCGGCAG 
TTCGTCACGG GTCGCGGCGG CTGTGGGAAC 
CTCGTCCAAC CGCGACGCGA TGGCCCGCGG 
CAGTCGGACA CCGGTCGCCG CGGCGAGTCG 
GATACCCAGT TCCTTGAAGG CCGCGTCCGC 
CACCGCCGCC GCGTTGTCGC GGACCAGTGC 
CATGGTGCCG AGCCGGTCGG CGAGCGGAAC 
G CGGCGC AG A TCGGCGAAAA GCGGCGATGT 
GGCGAACGCG GTGCCGGTTC CGGCCGCGGC 
CATGGGGCGG AAACCGCCGC GGCGGACACG 
GAGTCCGCTG TCATCGGCCC AGAGGCCCCA 
ATGGCGCAGC GTCGCGAGTC CGTCGAGGAA 
GCGGCCGCCC ATGATGCCCG CGACGGACGA 
CCGGGTCAGC TCGTGCAGGT GCCAGGCGCC 
CCGCTCCGGG GTGAGTGCCG TGGTCACGCC 
CGCCGTGAGC GGCCTGCCGG CGGCGGCGAG 
GTCACAGCGG ATGTGGACAC CGGGAGTGTC 
GAGGTGGCGG GCGCCATGCT CGGCGACGAG 
CGAGCCGCCG GTGATGACCA CCGTGCCGTC 
GGCGGCCGTG CGGGTGAACC GCGGCGCTTC 
CTCGGCCAGT GTCGTGGCGG CGGCCAGCGC 
CAGCACGAAC CGGCCCGGGT GCTCGGCCTG 
TCCGACCGGT CCCGCGTCGA TCCGGACGAC 
GATCACCCGG TGCAGCTCGC CGAGCACGAA 
ATCCGCGCCC GGTTCCGGGA GCGCGGAGAC 
GGGAGTGGGC AGCTCGGTCC AGGAGAGGCC 
GCCGTCGACG TTCACCGGTC GCGCGGTCAG 
CGGGTCCGTC GCATGCACGG CAGCGCCGTC 
GGTGGCCCCG GTCGTGTGGA ACCGCACGCC 
TTCCTGTTCC GCGAGCAGCG GCAGGCAGGT 
GTGGACGCCA TAGTGCGGCG TGTCGTCCGC 
CAGGGTTTCG CCGTCGCGCC AGGCGGTGCG 
GCCGGTGTCG GCCAGCCGCT CGTAGAACGC 
CGGCCACGCG GGCGGCGGGA CCGCCGCGAC 
GTGCCGGGTC CAGCTGTCCG TGCCCTCGGT 
GGCCTCATCG GCCCCTTCGA CGGTCACCGA 
GAGCGGGGTC TCGATGACCA GTTCATCCAC 
GATGACCAGC TCCACAAACG CCGTACCCGG 
AGCCAGCCAG GGATGCGTAC GCAACGAGAT 
GTCGGCGGGC AGTGCTGTGA CGGCGGCCAG 
CGCGGACAGA TCGGTGGCAC CGGCCGCCTC 
GGTGGGCAGA TCGA - GCAGCC GTCCCGGCAC 
CGTGCCCAGG GTCCACGCCT GCGCCAACGC - 
GGTCCGCAAC GACGCCACCG TGTGAGCCTG 
GCTGCACTCC ACGAACACGG ACCCGTCCAG . 
GCGACGCAGG TTCCGGTACC AGTAGCCCTC 
CACCGTGGAC CACCAGGCCA CCGACCCGGT 
CAACTCGTCC TCGATGGCTT CCACGTGGGG 
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21841 CGTGTGGGAG GCGTAGTCGA CCGCGATACG GCGCACTCGC ACGCCTTCGG CCTCGTACCG 
21901 CGTCACCACT TCTTCCACCG CGGACGGGTC CCCCGCCACC ACAGTCGAAG ACGGGCCGTT 
21961 ACGCGCCGCG ATCCACACGC CCTCGACCAG GTCCACCTCA CCGGCCGGCA ACGCCACCGA 
22 021 AGCCATCGCC CCCCGCCCGG CCAGCCGCCC GGCGATCACC TGGCTGCGCA AGGCCACCAC 
5 22081 GCGGGCGGCG TCCTCAAGGC TGAGGGCTCC GGCCACACAC GCCGCCGCGA TCTCGCCCTG 
22141 GGAGTGTCCG ACCACCGCGT CCGGCACGAC CCCATGCGCC TGCCACAGCG CGGCCAGGCT 
22201 CACCGCGACC GCCCAGCTGG CCGGCTGGAC. CACCTCCACC CGCTCCGCCA CATCCGGCCG 
222 61 CGCCAACATC TCCCGCACAT CCCAGCCCGT- GTGCGGCAAC AACGCCCGCG CACACTCCTC 
22321 CATACGAGCC GCGAACACCG CAGAACACGC CATCAACTCC ACACCCATGC CCACCCACTG 

10 22381 AGCACCCTGC CCGGGAAAGA CGAACACCGT ACGCGGCTGA TCCACCGCCA CACCCATCAC 
22441 CCGGGCATCG CCCAACAACA CCGCACGGTG ACCGAAGACA GCACGCTCAC GCACCAACCC 
22501 CTGCGCGACC GCGGCCACAT CCACACCACC CCCGCGCAGA TACCCCTCCA GCCGCTCCAC 
225 61 CTGCCCCCGC AGACTCACCT CACTCCGAGC CGACACCGGC AACGGCACCA ACCCATCGAC 
22 621 AGCCGACTCC CCACGCGACG GCCCGGGAAC ACCCTCAAGG ATCACGTGCG CGTTCGTACC 

15 22 681 GCTCACCCCG AAAGCGGAGA CACCGGCCCG GCGCGGACGT CCCGCGTCGG GCCACGCCCG 

227 41 CGCCTCGGTG AGCAGTTCCA CCGCGCCCTC GGTCCAGTCC ACATGCGACG ACGGCTCGTC 
22801 CACATGCAGC GTCTTCGGCG CGATGCCATA ■ CCGCATCGCC ATGACCATCT TGATGACACC 

228 61 GGCGACACCC GCAGCCGCCT GCGCATGACC GATGTTCGAC TTCAACGAAC CCAGCAGCAG 
22 921 CGGAACCTCA CGCTCCTGCC CGTACGTCGC CAGAATCGCG TGCGCCTCGA TGGGATCGCC 

20 22 981 CAGCGTCGTC CCCGTCCCGT GCGCCTCCAC CACGTCCACG TCGGCGGGGG CGAGCCCCGC 
23041 CTTGTGGAGG GCCTGGCGGA TGACGCGCTG CTGGGAGGGG CCGTTGGGTG CGGAGATGCC 
23101 GTTGGAGGCG CCGTCCTGGT TGACGGCGGA GGAGCGGACG ACCGCGAGGA CGGTGTGTCC 
23161 GTTGCGCTCG GCGTCGGAGA GCTTTTCGAC GACGAGGACG CCGGCCCCCT CGGCGAAACC 
23221 GGTGCCGTCC GCCGCGTCAG CGAACGCCTT GCACCGTCCG TCCGGCGCGA CGCCGCCCTG 
25 23281 CCGGGAGAAC TCCACGAAGG TCTGTGGTGA TGCCATCACT GTGACACCAC CGACCAGCGC 
23341 CAGCGAGCAC TCCCCGGTCC GCAGCGCCTG CCCGGCCTGG TGCAGCGCGA CCAGCGACGA 
2 34 01 CGAACACGCC GTGTCGACCG TGACCGCCGG ACCCTCCATG CCGAAGAAGT ACGACAGCCG 
23461 TCCGGCGAGC ACCGCGGGCT GTGTGCTGTA GGCGCCGAAT CCGCCCAGGT CCGCGCCCGT 
23521 GCCGTAGCCG TAGTAGAAGC CGCCGACGAA GACGCCGGTG TCGCTGCCGC GCAGGGTGTC 
30 23581 CGGCACGATG CCGGCGTGTT CGAGCGCCTC CCAGGCGATT TCGAGGAGGA TCCGCTGCTG 
23641 CGGGTCGAGT GCGGTGGCCT CGCGCGGACT GATGCCGAAG AACGCGGCAT CGAAGTCGGC 
23701 GGCGCCCGCG AGTGCGCCGG CCCGCCCGGT GGCGGACTCG GCGGCGGCGT GCAGCGCGGC 
23761 CACGTCCCAG CCGCGGTCGG TGGGGAAGTC GCCGATCGCG TCGCGGCCGT CCGCGACGAG 
23821 CTGCCACAGC TCTTCCGGTG AGGTGACGCC GCCCGGCAGT CGGCAGGCCA TGCCGACGAC 
35 23881 GGCGAGCGGC TCGTTCGCCG CGGCGCGCAG CGCGGTGTTC TCCCGGCGGA GCTGCGCGTT 
23941 GTCCTTGACC GACGTCCGCA GCGCCTCGAT CAGGTCGTTC TCGGCCATCG CCTCATCCCT 
24 001 TCAGCACGTG CGCGATGAGC GCGTCTGCGT CCATGTCGTC GAACAGTTCG TCGTCCGGCT 
24 061 CCGCGGTCGT GGTGCTCGCG GGTGCCTGTG CCGGTGGTTC ACCGCCGTCC GGGGTCCCGT 
2 4121 TGTCGTCCGG GGTCCCGTTG ACGTCCGGGG CCAGGAGGGT CAGCAGATGA CGGGTGAGCG 
40 24181 CGCCGGCGGC GGGATAGTCG AAGACGAGCG TGGCCGGCAG CGGAATGCCG AGGGCCTCGG 
24 241 AGAGCCGGTT GCGCAGGCCG AGCGCGGTGA GCGAGTCGAC CCCGAGGTCC TTGAACGCCG 
24 301 TGGTGGCCGT GACCGCCGCC GCGTCGGTGT GGCCCAGCAG GGTGGCGGCG GTGTCGCGGA 
24 361 CGACGCCGAG CAGCACCTGT TCCCGTTCCT TGTGGGGCAG GTCCGGCAGG CGTTCCAGCA 
24 4 21 GGGAGCCGCC GTCGGTCGCG GAGCGCCGGG TGGGGCGCTG GATCGGTCGC CACAGCGGTG 
45 24 4 81 ACGGGTCGCC GGGCCCGGGT GGGGCGGTCG CCACGACCAC GGCTTCCCCG GTGGCGCACG 
24 541 CGGCGTCGAG GAGGTCGGTC AGCCGGTCCG CCGCGGCGGT GAACGCCACG GCCGGCAGGC 
24 601 CTTGTGCCCG GCGCAGGTCG GCCAGGGCCT GGAGCGGTCC GGCCGCCTCG CCGGACGGAA 
24 661 CGGCGAGAAC GAACGCGGTC AGGTCGAGGT CGCGGGTCAG GCGGTGCAGT TCCCAGGCCG 
24 721 ACTCGGCGGT GCCGTCCGCG TGGACGACCG CGGTCACCGG GGTTTCCGGC ACTGTGCCCG 
50 24 781 GCTCGTACCG GATCACTTCG GCGCCGTGTC CGCCGAGGTG TCCGGCGAGT TCCTCCGAAC 
24 841 CGCCCGCGAG GAGGACGGTG TCGCCGTACG AGGCCGCGGC CGTGGTGGGC GCGGCGGGGA 
24 901 CGAGGCGGGG CGCTTCGAGG CGCCCGTCGG CCAGGCGCAG GTGCGGTTCG TCGAGGCGGG 
24 961 AGAGGGCGGC GGCGCGGCGG GGGGTGACCG TGTCGGTGGT CTCCACGAGC ACGAGCCGGC 
2 5021 CCGGTTCCGC GGTGTCGAGC AGTGCGGCGA CGGCACCGGC GACGGGCCCG GCCTCGGCGG 
55 25081 ACACCACCAG CGTGGCGCCG GCGGTCCTCG GGTCGTCCAG TGCGGTACGG ACCTCGTCGG 
25141 GACCGGATAC CGGGACGACG ATGACGTCGG GCGTGGCGTC GTCGCCGAGG TCGGTGTACC 
25201 GGCGGGCCGT GGTGCCGGGT GCCGCCGGGG CCCGGACGCC GGTCCAGGTG CGCCGGAACA 
252 61 3CCGCACGTC CCCGTCCGGG CCCGTCGTGG CGGGGGGCCG GGTGATGAGC GAGCCGATCT 
25321 GAGCCACCGG CCGTCCCAGT TCGTCGGCGA GGTGCACGCG GGCGCCGCCC TCGCCCTCGC 
60 25381 CGTGGACGAA GGTGACGCGC AGTTTCGTGG CGCCGCTGGT GTGGACACGG ACGCCGGTGA 
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2 5441 ACGCGAACGG CAACCGTACC CCCGCGTTCt CGGCGGCCGC GCCGATGCTG CCCGCTTGC£ 
^5501 GCGCGGTGAC GAGCAGCGCC GGGTGCAGTG TGTAGCGGGC GGCGTCCCTG GCGAGGGCGC 
255 61 CGTCGAGGGC GACTTCGGCG CAGACGGTGT CTCCGTGGCT CCACGCGGCG GACATGCCGC 
2 5 621 GGAACTCGGG GCCGAACTCG TATCCCGCGT CGTCGAGTCG CTGGTAGAAG GCCGCGACG~ 
25681 CGACCGGTTC CGCGTGCTCG GGCGGCCAGG GCCCCGGCGT GGTGGCCGGT TCGGTGGTGG 
2 5741 CGATGCCGGC GAAGCCGGAG GCGTGGCGGG TCCATGTCCG GTCGCCGTCC GTCCGGGCGT 
258 01 GGACGCGCAC GGCACGGCGT CCGGTGTCGT CGGGCGCGGC GACGGTCACG CGCACCTGGA 
2 5861 CGGCGCCGGT GGCGGGCAGG ACCAGCGGTG TCTCGACGAC CAGTTCGTCG AGCAGGTCGC 
25921 AGCCTGCCTC GTCGGCGCCG CGTCCGGCCA AT TCCAGGAA GGCGGGTCCG GGCAGCAGTA 
25981 CGGCGCCGTC GACGGAGTGA CCGGCCAGCC ATGGGTGGGT GGCCAGCGAG AACCGGCCGG 
2 6041 TGAGCAGCAC CTCGTCGGAG TCGGGGAGCG CCACCGACGC GGCGAGCAGC GGGTGGTCGA 
2 6101 CGGCGTCGAG TCCGAGGCCG GAAGCGTCCG TGCCGGCCGC GGTCTCGATC CAGTAGCGCT 
2 6161 CATGGTGGAA GGCGTATGTG GGCAGGTCGT GTGCCGTCGC CGTCGCGGGG ACGACCGCCG 
2 6221 CCCAGTCGAC GGGCACGCCG GTTGTGTGCG CCTCGGCCAG CGCGGTGAGC AGCCGGTGGA 
2 6281 CTCCCCCGCC GCGGCGGAGC GTGGCGACGG TCGCGCCGTC GATCGCGGGC AGCAGCACGG 
2 6341 GGTGCGCGCT GACCTCGACG AACACGGTGT CACCCGGCTC GCGGGCAGCG GTCACGGCCG 
2 6401 TGGCGAAGCC TACGGGGTGG CGCATGTTGC GGAACCAGTA CTCGTCGTCG AGCGGCGCGT 
2 64 61 CGATCCAGCG TTCGTCGGCG GTGGAGAACC ACGGGATCTC GGGCGTGCGC GAGGTGGTGT 
2 6521 CCGCGACGAT CCGCTGGAGT TCGTCGTACA GCGGGTCGAC GAACGGGGTG TGGGTCGGGC 
2 6581 AGTCGACGGC * GATGCGGCGC ACCCAGACGC CGCGGGCCTC GTAGTCGGCG ATCAGCGTTT 
2 6641 CGACGGCGTC CGGGCGCCCG GCGACGGTCG TGGTGGTGGC GCCGTTGCGG CCCGCGACCC 
2 6701 AGACGCCGTC GATCCGGGCG GCATCCGCCT CGACGTCGGC GGCCGGGAGC GCGACCGAGC 
2 67 61 CCATCGCGCC GCGTCCGGCG AGTTCGCGCA GGAGCAGGAG AACGCTGCGC AGCGCGACGA 
2 6821 GGCGGGCACC GTCCTCCAGG GTGAGCGCTC CGGCGACACA GGCCGCGGCG ATCTCGCCCT 
2 6881 GGGAGTGTCC GATGACGGCG TCCGGGCGTA CGCCCGCGGC CTCCCACACG GCGGCCAGCG 
2 6941 ACACCATGAC GGCCCAGCAG ACGGGGTGCA CGACGTCGAC GCGGCGGGTC ACCTCCGGGT 
27001 CGTCGAGCAT GGCGATGGGG TCCCAGCCCG TGTGCGGGAT CAGCGCGTCG GCGCATTGGC 
2 7061 GCATCCTGGC GGCGAACACC GGGGAGGCCG CCATCAGTTC GACGCCCATG CCGCGCCACT 
2 7121 GCGGTCCTTG TCCGGGGAAG ACGAAGACGG TGCGCGGCTC GGTGAGCGCC GTGCCGGTGA 
2 7181 CGACGTCGTC GTCGAGCAGC ACGGCGCGGT GCGGGAACGT CGTACGCCTG GCGAGCAGGC 
27241 CCGCGGCGAT GGCGCGCGGG TCGTGGCCGG GACGGGCGGC GAGGTGCTCG CGGAGTCGGC 
2 7301 GGACCTGGCC GTCGAGGGCC GTGGCGGTCC GCGCCGAGAC GGGCAGTGGT GTGAGCGGCG 
2 7 361 TGGCGATCAG CGGCTCACCG GGCTTCGAGG CCGACGGCTC CTCGGCCGGC GGCTCCCCGG 
27421 CCGGGTGGGC TTCCAGCAGG ACGTGGGCGT TGGTGCCGCT GACGCCGAAG GAGGACACAC 
274 81 CGGCGCGCCG CGGGCGGTCG GTCTCGGGCC AGGGCCGGGC ATCGGTGAGG AGTTCGACGG 
27 541 CGCCGGCCGT CCAGTCGACG TGCGAGGACG GCGTGTCCAC GTGCAGGGTG CGCGGCAGGG 
27 601 TGCCGTGCCG CATGGCGAGG ACCATCTTGA TGACACCGGC GACACCCGCG GCGGCCTGAG 
27 661 TGTGGCCGAT GTTGGACTTC AGCGAGCCCA GCAGCACCGG GGTGTCGCGC CCCTGCCCGT 
2" 721 AGGTGGCCAG CACCGCCTGT GCCTCGATGG GATCGCCCAG CCTGGTGCCG GTGCCGTGCG 
27781 CCTCCACGGC GTCCACGTCC GCCGGGGTGA GCCCGGCGTT GGCCAGGGCC TGCCGGATCA 
27841 CCCGCTCCTG CGAGGGCCCG TTCGGCGCCG ACAACCCGTT GGAAGCACCG TCCTGGTTGA 
27901 CCGCCGAACC CCGGACAACC GCCAGCACAC GGTGGCCGTT GCGCTCGGCA TCGGAGAGCC 

27 961 TCTCGACGAT CAGCACACCG GACCCCTCGG CGAAACCGGT GCCGTCAGCC GCATCCGCGA 
28021 ACGCCTTGCA GCGCGCGTCG GGCGCGAGAC CCCGCTGCTG GGAGAACTCG ACGAAGCCGG 

28 081 ACGGCGAGGC CATCACCGTG ACGCCGCCGA CCAGGGCGAG CGAGCATTCG CCGGAGCGCA 
28141 GTGACTGCCC GGCCTGGTGC AGCGCCACCA GCGACGACGA ACACGCCGT.G TCGACCGTGA 
28201 CCGCCGGACC CTCCAGACCG TAGAAGTACG ACAGCCGACC GGACAGCACA CTGGTCTGGG 
28261 TGCCGGTCGC GCCGAAACCG CCCAGGTCGG TGCCGAGTCC GTACCCGTCG GAGAAGGCGC 
28321 CCATGAACAC GCCGGTGTCG CTTCCGCGCA GCGACTCCGG GAGGATCCCG GCGTGTTCCA 
28381 GCGCCTCCCA CGAGGTCTCC AGGACCAGAC GCTGCTGCGG GTCCATCGCC AGCGCCTCAC 
29441 GCGGACTGAT CCCGAAGAAC GCCGCGTCGA AGTCCGCCAC CCCGGCGAGG AAGCCACCAT 
28501 GACGCACGGT CGACGTGCCC GGATGATCCG GATCGGGATC GTACAGCCCG TCCACGTCCC 
28 561 AACCACGGTC CGTCGGAAAC GCCGTGATCC CGTCACCACC CGACTCCAGC AGCCGCCACA 
28 621 AGTCCTCCGG CGACGCGACC CCACCCGGCA GCCGGCAGGC CATCCCCACG ATCGCCAACG 
28 681 GCTCGTCCTG CCGGACGGCC GCGGTCGTGG TGCGGGTCGG CGATGCCGTC CGGCCGGACA 
28741 GCGCCGCGGT GAGCTTCGCC GCGACGGCGC GCGGCGTCGG GAAGTCGAAG ACCGCGGTGG 
2 5801 CGGGCAGCCG TACGCCCGTC GCCTCGGTGA AGGCGTTGCG CAGCCGGATC GCCATGAGCG 
2 5 361 AGTCGACGCC GAGTTCCTTG AACGTGGCGG TCGCCTCGAC CCGTGCGGCA CCGTCGTGGC 
28921 CGAGTACGGC CGCGGTGCAC TGCCGGACGA CGGCGAGCAC GTCCTTTTCG GCGTCCGCGG 
28981 CGGAGAGCCG CGCGATCCGG TCGGCGAGGG TGGTGGCGCC GGCCGCCCGG . CGCCGCGGCT 
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29041 CCCGGCGCGG TGCGCGCAGC AGGGGCGAGC TGCCGAGGCC GGCCGGGTCG GCGGCGACCA 
29101 GCGCCGGGTC CGAGGACCGC AACGCCGCGT CGAACAGCGT CAGTCCGCCT TCGGCGG~CA 
29161 GCGCCGTCAC GCCGTCGCGG CGCATGCGGG CGCCGGTGCC GACCGTCAGC CCGC^C^CCG 
2 9221 GTTCCCACAG GCCCCAGGCC ACGGACAACG CGGGC^.GTCC GGCTGCCCGG ~GCTG~~~ G G 
5 2 9281 CCAGCGCGTC GAGGAACGCG TTCGCGGCCG CGTAGTTGCC CTGTCCGGGG CTGCCGAGCA 
29341 CACCGGCGGC CGACGAGTAG AGGACGAACG CGGCCAGTTC CGTGTCCTGG GTGAGTTCGT 
2 9401 GCAGGTGCCA CGCGGCGTCC ACCTTCGGGC G C AG C AC CG T CTCGAGCCGG TCGGGGGTGA 
2 94 61 GCGCGGTGAG GACGCCGTCG TCGAGGACGG CCGCGGTGTG CACGACGGCC GTGAGCGGGT 
29521 GCGCCGGGTC GATCCCCGCC AGTACGGAGG CGAGTTCGTC CCGGTCGGCG ACGTCGCAGG 
10 2 9581 CGATCGCCGT GACCTCGGCG CCGGGCACGT CGCTCGCCGT GCCGCTGCGC GACAGCATCA 
29641 GCAGCCGGCG CACGCCGTGG CGTTCGACGA GGTGGCGGCT GATGATGCCG GCCAGCGTCC 
2 9701 CGGAGCCACC GGTGACGAGC ACGGTGCCGT CCGGGTCGAG CGCCGGAGCG TCACCCGCCG 
29761 GGACCGCCGG GGCCAGACGG CGGGCGTACA CCTGGCCSTC ACGCAGCACC ACCTGGGGCT 
2 9821 CATCGAGCGC GGTGGCGGCT GCGAGCAGCG GCTCGGCGGT GTCCGGGGCG GCGTCGACGA 
15 2 9881 GGACGATCCG GCCGGGGTGT TCGGCCTGCG CGGTCCGCAC CAGTCCGGCG GCCGCGGCCG 
29941 ACGCGAGACC GGGCCCGGTG TGGACGGCCA GGACCGCGTC GGCGTACCGG TCGTCGGTGA 
30001 GGAAGCGCTG CACGGCGGTC AGGACGCCGG CGCCCAGTTC GCGGGTGTCG TCGAGCGGGG 
30061 CACCGCCGCC GCCGTGCGCG GGGAGGATCA CCACG7CCGG GACCGTCGGG TCGTCGAGGC 
30121 GGCCGGTCGT CGCGGTCGTG GGCGGCAGCT CCGGGAGCTC GGCCAGCACC GGGCGCAGCA 
20 30181 GGCCCGGAAC GGCTCCCGTG ATCGTCAGGG GGCGCCTGCG CACGGCGCCG ATGGTGGCGA 
30241 CGGGCCCGCC GGTCTCGTCC GCGAGGTGTA CGCCGTCAGC GGTGACGGCG ACGCGTACCG 
30301 CCGTGGCGCC GGTGGCGTGG ACGCGGACGT CGTCGAACGC GTACGGAAGG TGGTCCCCTT 
30361 CCGCGGCGAG GCGGAGTGCG GCGCCGAGCA GCGCCGGGTG CAGGCCGTAC CGTCCGGCGT 
304 21 CGGCGAGCTG TCCGTCGGCG AGGGCCACTT CCGCCCAGAC GGCGTCGTCG TCGGCCCAGA 
25 304 81 CGGCGCGCGG GCGGGGCAGC GCGGGCCCGT CCGTGTACCC GGCTCGGGCC AGACGGTCGG 
30541 CGATGTCGTC GGGGTCCACC GGCCGGGCCG TGGCGGGCGG CCACGTCGAC GGCATCTCCC 
30601 GCACGGCCGG GGCCGTCCGC GGGTCGGGGG CGAGGATTCC GTGCGCGTGC TCGGTCCACT 
30661 CCCCCGCCGC GTGCCGCGTG TGCACGGTGA CCGCGCGGCG GCCGTCCGCC CCGGGCGCGC 
30721 TCACCGTGAC GGAGAGCGCG AGCGCACCGG ACCGCGGCAG CGTGAGGGGG GTGTCCACGG 
30 307 81 TGAACGTGTC GAGGGCGCCG CAGCCGGCTT CGTCGCCCGC CCGGATCGC'C AGATCCAGGA 
308 41 GGGCCGCGGC GGGCAGCACC GCGAGGCCGT GCAGGGAGTG CGCCAGCGGA TCGGCGGCGT 
30901 CGACCCGGCC GGTGAGCACC AGGTCGCCGG TGCCGGGCAG GGTGACCGCC GCGGTCAGCG 
30961 CCGGGTGCGC GACCGGCGTC TGTCCGGCCG GGGCCGCGTC GCCCGCGGTC TGGGTGCCGA 
31021 GCCAGTAGCG GACCCGCTCG AACGGGTACG TCGGCGGGTG CGAGGCGCGT GCCGGCGCGG 
35 31081 GGTCGATGAC CTTCGGCCAG TCGACCGTGA CGCCGTCGGT GTGCAGCCGG GCGAGCGCGG 
31141 TCAGGGCGGA TCGCGGTTCG TCGTCGGCGT GCAGCATCGG GATGCCGTCG ACGAGTCGGG 
31201 TCAGGCTCCG GTCCGGGCCG ATCTCCAGGA GCACCGCCCC GTCGTGCGCG GCGACCTGTT 
31261 CCCCGAACCG GACGGTGTCG CGGACCTGTC GTACCCAGTA CTCCGGCGTG GTGCAGGCGG 
31321 CGCCCGCGGC CATCGGGATC CTCGGCTCGT GGTACGTCAG GCTCTCCGCG ACCTTGCGGA 
40 31381 ACTCCTCGAG CATCGGCTCC ATCCGCGCCG AGTGGAACGC GTGGCTGGTC CGCAGGCGGG- 
314 41 TGAAGCGGCC GAGCCGGGCC GCGACGTCGA GCACCGCCTC CTCGTCACCG GAGAGCACGA 
31501 TCGACGCGGG CCCGTTGACC GCGGCGATCT CCACGCCGTC CCGCAGCAGC GGCAGCGCGT 
31561 CCCGTTCCGA CGCGATCACG GCGGCCATCG CCCCGCCGGA CGGCAGCGCC TGCATCAGGC 
31621 GGGCCCGTGC GGACACCAGC CTGCACGCGT CCTCCAGGGA CCAGACGCCG GCGACGTACG 
45 31681 CGGCGGCCAG CTCGCCGATC GAATGGCCCA CGAAGGCGTC CGGGCGTACG CCCCACGCCT 
31741 CGAGCTGTGC GCCGAGTGCG ACCTGGAGCG CGAACACCGC GGGCTGGGCG TACCCGGTGT 
31801 CGTGGAGGTC GAGCCCGGCG GGCACGTCGA GGGCGTCCAG CACCTCGCGG CGAGTGCGGG 
318 61 CGAAGACG TC GTAGGCGGCG GCCAGTCCGT CGCCCATGCC GGGACGTTGT GAGCCCTGTC 
31921 CGGAGAAGAG CCACACGAGG CGGCGGTCCG GTTCTGCGGC GCCGGTGACC GTGTCGGTGC 
50 31981 CGATCAGCGC GGCCCGGTGC GGGAAGGCCG TGCGGGCGAG CAGGGCCGCG GCCACCGCGC 
32041 GCTCGTCCTC CTCGCCGGTG GCGAGGTGGG CGCGCAGGCG GTGTACCTGT GCGTCGAGTG 
32101 CCTGCGGGG? GCGTGCCGAG AGCAGCAGGG GCAGCGG7CC GGTGTCGGGT GCCGGGGCGG 
32161 GTTCGGGGGC CGGTCGGGGG TGGCTTTCGA GGATGA7GTG. AGCGTTGGTG CCGCTAACGC 
32221 CGAAGGAGGA CACCCCGGCG CGCCGTGGGC GGTCGG77TC GGGCCAGGGG CGGGCGTCGG 
55 3 2 2 8 1 - TG AGGAGTTC GACGGCGCCG GCCGTCCAGT CGACGTGCGA GGACGGCGTG TCCACGTGCA 
32341 GGGTGCGCGG CAGGGTGCCG TGCCGCATGG CGAGGACCAT CTTGATGACA CCGGCGACGC 
324 01 CCGCGGCGGC CTGAGTGTGG CCGATGTTGG ACTTCAGCGA GCCCAGCAGC ACCGGGGTGT 
32461 CGCGATGCTG CCCGTAGGTG GCCAGTACCG CCTGCGCCTC. GATGGGGTCG CCCAGCCTGG 
32521 TCCCGGTGCC ATGCGCCTCG ACAGCGTCCA CATCCGCCGG GGTGAGCCCG GCGTTGGCCA 
60 32581 GCGCCTGCCG GATCACCCGC TCCTGCGACG GCCCGTTCGG CGCCGACAAC CCGTTGGAAG 
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32 641 CACCG7CCTG GTTGACCGCC GAACCACGCA CGACCGCCAG GACATTGTGG CCGTGCCGCT 
32701 CGGCGTCGGA GAGCCTCTCG ACGATCAGCA CACCGGATCC CTCGGCGAAA CCGGTGCCAT 
32761 CAGCCGCATC CGCGAACGCC TTGCAGCGGC CGTCCGGGGA GAGGCCCCGC TGCTGGGAGA 
32821 AGTCCACGAA GCCGGACGGC GAGGCCATCA CCGTGACGCC GCCGACCACG GCGAGCGAGC 
32881 ACTCCCCCGA GCGCAGCGAC TGCCCGGCCT GGTGCAGCGC CACCAGCGAC GACGAACACG 
32 941 CCGTGTCCAC CGTGACCGCC GGACCCTCCA AACCGTAGAA GTACGACAGC CGACCGGACA 
33001 GCACACTGGT CTGGGTGCTG GTGGCACCGA AACCGCCGCG GTCGGCTCCA GTGCCGTACC 
33061 CGTAGAAGTA GCCGCCCATG AACACGCCGG TGTCGCTTCC GCGCAGCGAC TCCGGGAGGA 
33121 TCCCGGCGTG TTCCAGCGCC TCCCACGAGG TCTCCAGGAC CAGACGCTGC TGCGGGTCCA 
33181 TCGCCAGCGC CTCACGCGGA CTGATCCCGA AGAACGCCGC GTCGAAGTCC GCCACCCCGG 
332 41 CGAGGAAGCC ACCATGACGC ACGGTCGACG TGCCCGGATG ATCCGGATCG GGATCGTACA 
33301 GCCCGTCCAC GTCCCAACCA CGGTCCGTCG GAAACGCCGT GATCCCGTCA CCACCCGACT 
33361 CCAGCAGCCG CCACAAGTCC TCCGGCGACG CGACCCCACC CGGCAGCCGG CAGGCCATCC 
33421 CCACGATCGC CAACGGCTCG TCCTGCCGGA CGGCCGCGGT CGGGGTACGC CGCCGGGTGG 
334 81 TGGCCCGCGC GCCGGCCAGT TCGTCCAGGT GGGCGGCGAG CGCCTGCGCC GTGGGGTGGT 
33541 CGAAGACGAG CGTAGCGGGC AGCGTCAGGC CCGTCGCGTC GGCCAGCCGG TTGCGCAGTT 
33601 CGACGCCGGT CAGCGAGTCG AAGCCCACTT CCCTGAACGC GCGCGCGGGT GCGATGGCGT 
33661 GGGCGTCGCG GTGGCCGAGC ACCGCGGCAG CGCTGGTACG GACGAGGTCG AGCATGTCGC 
33721 GCGCGGCCGG AGGTGCGGAC GTGCGCCGGA CGGCCGGCAC GAGGGTGCGT AGGACCGGCG 
33781 GGACCCGGTC GGACGCGGCG ACGGCGGCGA GGTCGAGCCG GATCGGCACG AGCGCGGGCC 
33841 GGTCGGTGTG CAGGGCCGCG TCGAACAGGG CGAGCCCCTG TGCGGCCGTC ATCGGGGTCA 
33901 TGCCGTTGCG GGCGATGCGG GCCAGGTCGG TGGCGGTCAG CCGCCCGCCC ATCCCGTCCG 
33961 CCGCGTCCCA CAGTCCCCAG GCGAGCGAGA CGGCGGGCAG CCCCTGGTGG TGCCGGTGGC 
34 021 GGGCGAGCGC GTCGAGGAAC GCGTTGCCGG TCGCGTAGTT GGCCTGACCC GCGCCGCCGA 
34 081 ACGTGGCGGA TATGGACGAG TACAGGACGA ACGCGGCCAG GTCGAGATCG CGCGTCAGCT 
34141 CGTGCAGGTG CCAGGCGACG TCCGCCTTGA CCCGCAGCAC GGCGTCCCAC TGCTCCGGCC 
34 201 GCATGGTCGT CACGGCCGCG TCGTCGACGA TCCCGGCCAT GTGCACGACG GCGCGCAGCC 
34 261 GCTGGGCGAC GTCGGCGACG ACTGCGGCCA GCTCGTCGCG GTCGACGACG TCGGCGGCCA 
34 321 CGTACCGCAC GCGGTCGTCC TCCGGCGTGT CGCCGGGCCG GCCGTTGCGG GACACCACGA 
34 381 CGACCTCGGC GGCCTCGTGC ACGGTGAGCA GGTGGTCCAC GAGGAGGCGG CCGAGCCCGC 
34 441 CGGTGCCGCC GGTGACGAGG ACGGTCCCGC CGGTCAGCGG GGAGGTTCCG GTGGCCGCGG 
34 501 CGACACGGCG CAGACGGGCC GCACGCGCTG TGCCGTCGGC GACCCGGACG TGCGGCTCGT 
34 561 CGCCGGCGGC GAGCCCGGCC GCTATGGCGG CGGGCGTGAT CTCGTCCGCT TCGATCAGGG 
34 621 CGACGCGGCC GGGATGCTCC GTCTCCGCCG TCCGGACCAG GCCGCCGAGC GCTTCCTGCG 
34 681 CGGGATCGCC GGTACGGGTG GCCACGATGA GCCGGGATCG CGCCCAGCGC GGCTCGGCGA 
34 741 GCCAGGTCTG CACGGTGGTG AGCAGGTCGC GGCCCAGCTC CCGGGTCCGG GCGCCGGGCG 
34 801 AGGTGCCCGG GTCGCCGGGT TCCACGGCCA GGACCACGAC CGGGGGGTGC TCGCCGTCGG 
34 8 61 GCACGTCGGC GAGGTACGTC CAGTCGGGGA CGGGTGACGC GGGCACGGGC ACCCAGGCGA 
34 921 TCTCGAACAG CGCCTCGGCA TCGGGGTCGG CGGCCCGCAC GGTCAGGCTG TCGACGTCAA 
34 981 GGACCGGTGA GCCGTGCTCG TCCGTGGCGA CGATGCGGAC CATGTCGGGG CCGACGCGTT 
35041 CCAGCAGCAC GCGCAGCGCG GTCGCGGCGC GCGCGTGGAT CCTCACGCCG GACCAGGAGA 
35101 ACGCCAGCCG GCGCCGCTCC GGGTCCGTGA AGACCGTCCC GAGGGCGTGC AGGGCCGCGT 
35161 CGAGCAGCAC GGGGTGCAGC CCGTACCGGG CGTCGGTGAG CTGTTCGGCG AGGCGGACCG 
35221 ACGCGTAGGC GCGGCCCTCC CCCGTCCACA TCGCGGTCAT GGCCCGGAAC GCGGGCCCGT 
35281 ACGAGAGCGG CAGCGCGTCG TAGAAGCCGG TCAGGTCGGC CGGGTCGGCG TCGGCGGGCG 
35341 GCCAGTCCAC GGGCTCCGCC GGACCGCCAG TGTCCACGCT CAGCGCTCCG GTCGCACTGA 
354 01 GCGCCCAGGG GCCCGTGCCG GTACGGCTGT GCAGACTCAC CGACCGCCGT CCGGACACCT 
354 61 CGGTTCCGAC GGTGGCCTGG ATCTCCGTGT CGCCGTCGCC GTCGACCACC ACCGGCGCGA 
35521 CGATGGTCAG CTCCGCG ATC TCCGGCGTGC CGAGCCGGGC TCCCGCTTCG GCGAGCAGTT 
35581 CCACGAGCGC CGAGCCGGGC ACGATGACCC GGCCGTCCAC CTCGTGGTCG GCGAGCCAGG 
35641 GCTGACGGCG TACCGAGACA CCGCGGTGGC CAGCGCGCCC TCGCCGTCGG GCGAGGTCGA 
35701 CCCACGAGCC GAGCAGCGGG TGGCCGGACG TTCCCGCCGG TTCCGCGTCG ATCCAGTAGC 
35761 GGTCACGGCG GAACGGGTAC GTGGGCAGCG GCACCACCCG ACGCGTCGCG AACGACCAGG 
35821 TGACGGGCAC GCCCCGGACC CAGAGCGCGG CGAGCGACCG AGTGAAGCGG TCCAGGCCGC 
35881 CCTCGCCTCG CCGCAGTGTG CCGGTGACGA CCGTATGCGC ATGCCCGGCG AGCGTGTCCT 
35941 CCAGTGCGGT GGTGAGCACG GGATGCGCGC TGACCTCGAC GAACGCGCGG TATCCGCGGT 
3 6001 CCGCCAGGTG GCCGGTCGCG GCGGCGAACC GAACGGTGCG GCGCAGGTTG TCGTACCAGT 
36061 AGGCGGCGTC CGCGGGCCGG TCCAGCCACG CCTCGTCCAC GGTGGAGAAG AACGGGACGT 
36121 CCGGCGTGCG CGGAGTGATG CCGGCGAGAG CGTCGAGCAG CGCGCCGCGG ATCGTTTCGA 
36181 CATGCGCGGT GTGCGACGCG TAGTCGACGG CGATCCGGCG GGCGCGGGGG GTGGCGGCCA 
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36241 G7AGCTCCTC CACGGCGTCG GCCGCACCGG CGACAACGA7 CGACGCGGGT CCGTTGACCG 
36301 CGGCGACCTC CAGGCGCCCG GCCCACACGG CGGCGTCGAA GTCGGCGGGC GGCACCGAGA 
36361 CCATGCCGCC CTGCCCGGCC AGTTCGGTGG CGACGAGTCG GCTGCGCACC GCGACGACCT 
3 6421 77GCGGCGTC GTCCAGGGTG AGCACCCCGG CGACGCAGGC CGCGGCGACT TCGCCCTGGG 
5 36481 AGTGGCCGAC GACCGCGGCC GGGGCGACCC CGTGCGCACG CCACAGCTCC GCCAGCGCCA 
3 6541 CCATCACCGC GAACGACGCG GGCTGCACGA CATCGACCCG GTCGAACGCG GGCGCTCCGG 
3 6601 GCCGCTGGGC GATGACGTCC AGCAGGTCCC ATCCGGTGTG CGGGGCGAGC GCCGTGGCGC 
3 6661 ACTCGCGGAG CCGCCGGGCG AACACGGGCT CGGTGGCGAG CAGTTCGGCA CCCATGCCGG 
36721 CCCACTGGGA GCCCTGCCCG GGGAACGCGA ACACGACACG TGTGTCGGTG ACGTCGGCGG 

10 3 6781 77CCCGTCAC GGCCCCCGGC ACTTCGGCAC CACGGGCGAA CGCCTCCGCC TCTCGGGCCG 
3 6841 GCACGACCGC CCGGTGGCGC ATGGCCGTCC GGGTGGTGGC GAGCGAGTGG CCGACCGCGG 
36901 CCGCGGCGCC AGTGAGCGGG GCCAGCTGTC CCGCGACGTC CCGCAGTCCC TCCGGGGTCC 
3 6961 GGGCCGACAT CGGCCAGACC ACGTCCTCGG GCACCGGCTC GGCTTCGGGT GCGGACACGG 
37021 G7GCGGGCGC GGCGGGGGGC CCGGCC7CCA GGACGACATG GGCGTTGGTG CCGCTGATGC 

15 37081 CGAACGACGA GACACCCGCA CGCCGGGCGC GCCCGGTGAC CGGCCACGGC TCACTGCGGT 
37141 GCAGCAGCCG GATGTCGCCG TCCCAGTCGA CGTGCCGGGA CGGCTCGTCG ACGTGCAGCG 
37201 TGCGCGGCAG GACGCCGTGC CGCATCGCCA TGACCATCTT GATGACGCCG GCGACGCCGG 
372 61 CCGCGGCCTG GGTGTGGCCG ATGTTCGACT TGAGCGAGCC GATCAGCAGC GGATGCACGC 
37 321 G7TCGCGCCC GTAGGCCACT TGCAGGGCCT GGGCCTCGAC GGGGTCGCCG AGACGGGTGC 

20 37381 CGGTGCCGTG TGCCTCCACG GCGTCGACGT CACCCGGCGC CAGGCCGGCG TCGGCGAGCG 

374 41 CACGCTGGAT GACGCGCTGC TGCGCAGGCC CGTTCGGGGC GGACAGCCCG TTCGACGCGC 
37501 CGTCGGAGTT GACCGCGGAG CCGCGCACCA GCGCCAGCAC GGGGTGGCCG TGGCGGGTGG 

375 61 CGTCGGAGAG CCGCTCCAGC ACCAGGACAC CGGCGCCCTC GGCGAAGCTC GTGCCGTCGG 
37 621 CGGTGTCCGC GAAGGCCTTG GCACGGCCGT CGGGGGCGAG CCCGCGCTGC CGGGAGAACT 

25 37 681 CGACGAACCC GGTCGTCGTC GCCATCACCG TGACACCGCC GACCAGGGCG AGCGAGCACT 
37741 CCCCCGAGCG CAGCGACCGC GCGGCCTGGT GCAGCGCCAC CAGCGACGAC GAACACGCCG 
37801 7GTCGACGGT GACCGACGGG CCCTCCAGAC CGAAGTAGTA CGAGAGCCGC CCGGAGAGAA 
378 61 CGCTGGTCGG CGTGCCGGTC GCCCCGAAAC CGCCCAGGTC CACGCCCGCG CCGTAGCCCT 

37 921 GGGTGAACGC GCCCATGAAT ACGCCGGTGT CGCTGCCGCG GACGCTTTCG GGCAGGATGC 
30 37 981 CCGCTCGTTC GAACGCCTCC CACGACGCTT CGAGGACCAG ACGCTGCTGC GGGTCCATCG 

38041 CCAGCGCCTC ACGCGGGCTG ATCCCGAAGA ACGCGGCGTC GAAGTCGGCG GCGGCGGTGA 
38101 GGAAGCCGCC GTGACGCACG GAAACCTTGC CGACCGCGTC GGGGTTCGGG TCGTAGAGCG 
38161 CGGCGAGGTC CCAGCCGCGG TCGGCGGGGA ACTCGGTGAT CGCGTCCCCG CCGGAGTCGA 
38221 CCAGCCGCCA CAGGTCCTCC GGTGACCGGA CGCCACCGGG CATCCGGCAC GCCATGGCCA 
35 38 281, CGATCGCCAG CGGCTCGTTC CCCGCCACCG TCGGTGCGGG CACTGTCGCC . GCCGGAGCGG 
3834 1 CAGGGGCCGG CTCACCCCGC CGTTCCTCAT CCAGGCGGGC GGCGAGCGCG GCCGGTGTCG 

38 4 01 GGTGGTCGAA GACGGCCGTC GCGGAGAGCC GTACCCCCGT CGTCTCGGCG AGGCTGTTGC 
38 4 61 GCAACCGGAC ACCGCTGAGC GAGTCGATGC CGAGGTCCTT GAACGCCGTC GTGGGCGTGA 
38521 7CTCGGAGGC GTCGGCGTGG CCGAGCACGG CGGCCGTGGC CGCACACACG ATGGCCAGCA 

40 38 581 GGTCACGATC GCGGTCGCGG TCGCGGTCGC GGTTGTCCTC CGCACGGGCG GCGATGCGGC 
38 641 GCTCGGTCCG CTGCCGGACG GGCTCGGTGG GAATGGCCGC GACCATGAAC GGCACGTCCG 
387 01 CGGCGAGGCT CGCGTCGATG AAGTGGGTGC CCTCGGCCTC GGTGAGCGGC CGGAACCCGT 
38761 CGCGCACCCG CTGCCGGTCG GCGTCGTCAA GTTGTCCGGT GAGGGTGCTG GTGGTGTGCC 
38821 ACATGCCCCA GGCGATGGAG GTGGCGGGTT GGCCGAGGGT GTGGCGGTGG GTGGCGAGGG 

45 38881 CGTCGAGGAA GGCGTTGGCG GCGGCGTAGT TTCCTTGTCC GGGGCTGCCG AGGACGGCGG 

38 941 CGGCGCTGGA GTAGAGGACG AAGTGGGTGA GGGGTTGGTT TTGGGTGAGG TGGTGCAGGT 
390.01 GCCAGGCGGC GTTGGCTTTG GGGTGGAGGA CGGTGGTGAG GCGGTCGGGG GTGAGGGCGT 
39061 CGAGGATGCC GTCGTCGAGG GTGGCGGCGG TGTGGAAGAC GGCGGTGAGG GGTTGGGGGA 
39121 7GTGGGCGAG GGTGGTGGCG AGTTGGTGGG GGTCGCCGAC GTCGCAGGGG AGGTGGGTGC 

50 3 9181 CGGGGGTGGT GTCGGGGGGT GGGGTGCGGG AGAGGAGGTA GGTGTGGGGG TGGTTCAGGT 
39241 GGCGGGCGAG GATGCCGGCG AGGGTGCCGG AGCCGCCGGT GATGATGATG GCGTGTTCGG 
39301- GGTTGAGGGG GGTGGTGGTG GGTGGGG7GG TGGTGTGGAG GGGGGTGAGG TGGGGTCGGT 
39361 GGAGGGTGTG GTGGGTGAGG CGGAGG7GGG GGTGGTCGAG GGTGGCGAGT TGGGCCAGGG 
39421" GGAGGGGAGT GTGGGGGTGG TCGGTTTCGA TGAGGCGGAT GCGGTGGGGG TGT7CGTTCT 

55 39481 GGGCGGTGCG GGTGAGGCCG GTGACGG7GG CGCCGGCGGG GTCGGTGGTG GTGTGGACGA 
39541 7GAGGGTGTG GTCGGTGGTG GTGAGG7GGT GTTGCAGGGC GGTCAGGACG CGGGTGGCGC 

39 601 G3GTGTGGGC GCGGGTGGGT ATGTCC7CGG GGTCGTCGGG GTGGGCGGCG GTGA7CAGGA 
39661 G3TGTCCCTC GGGCAGGTCA CCGTCG7AGA CCGCCTCGGC GACCGCGAGC CAC7CCAACC 
39721 GGAGCGGGTT CGGCCCCGAC GGGGTGTCGG CCCGCTCCCT CAGCACCAGC GAGTCCACCG 

60 3 9781 ACACGACAGG ACGGCCATCC GGGTCGGCCA CGCGCACGGC GACGCCGGCC TCCCCCCGGG 
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39841 TGAGGGCGAC GCGCACCGCG GCGGCCCCGG TGGCGTTCAG GCGCACGCCC GTCCAGGAGA 
39901 ACGGCAGCTC GATCCCGCCG CCCGCGTCGA GGCGCCCGGC GTGCAGGGCC GCGTCGAGCA 
39961 GTGCCGGATG CACACCGAAA CCGTCCGCCT CGGCGGCCTG CTCGTCGGGC AGCGCCACCT 
4 0021 C3GCATACAC GGTGTCACCA TCACGCCAGG CAGCCCGCAA CCCCTGGAAC GCCGACCCGT 
4 0081 ACTCATAACC GGCATCCCGC AGTTCGTCAT AGAACCCCGA GACGTCGACG GCCGCGGCCG 
4 0141 TGGCCGGCGG CCACTGCGAG AACGGCTCAC CGGAAGCGTT GGAGGTATCC GGGGTGTCGG 
4 0201 GGGTCAGGGT GCCGCTGGCG TGCCGGGTCC AGCTGCCCGT GCCCTCGGTA CGCGCGTGGA 
4 02 61 CGGTCACCGG CCGCCGTCCG GCCTCATCGG CCCCTTCCAC GGTCACCGAC ACATCCACCG 
4 0321 CTGCGGTCAC CGGCACCACG AGCGGGGATT CGATGACCAG TTCATCCACC ACCCCGCAAC 
4 0381 CGGTCTCGTC ACCGGCCCGG ATGACCAGCT CCACAAACGC CGTACCCGGC AGCAGAACCG 
4 0441 T3CCCCGCAC CGCGTGATCA GCCAGCCAGG GATGCGTACG CAATGAGATC CGGCCGGTGA 
4 0501 GAACAACACC ACCACCGTCG TCGGCGGGCA GTGCTGTGAC GGCGGCCAGC ATCGGATGCG 
4 0561 CCGCCCCGGT CAGCCCGGCC GCGGACAGGT CGGTGGCACC GGCCGCCTCC AGCCAGTACC 
4 0621 GCCTGTGCTC GAACGCGTAG GTGGGCAGAT CCAGCAGCCG CCCCGGCACC GGTTCGACCA 
4 0681 CCGTGCCCCA GTCCACCCCC GCACCCAGAG TCCACGCCTG CGCCAACGCC CCCAGCCACC 
4 0741 GCTCCCAGCC ACCGTCACCA GTCCGCAACG ACGCCACCGT GCGGGCCTGT TCCATCGCCG 
4 0801 GCAGCAGCAC CGGATGGGCA CTGCACTCCA CGAACACCGA CCCGTCCAGC TCCGCCACCG 
4 08 61 CCGCATCCAG CGCGACAGGG CGACGCAGGT TCCGGTACCA GTACCCCTCA TCCACCGGCT 
40921 CGGTCACCCA GGCGCTGTCC ACGGTCGACC ACCACGCCAC CGACCCGGTC CCGCCGGAAA 
4 0981 TTCCCTTCAG TACCTCAGCG AGTTCGTCCT CGATGGCCTC CACGTGAGGC GTGTGGGAGG 
41041 CGTAGTCGAC CGCGATACGA CGCACCCGCA CCCCATCAGC CTCATACCGC GCCACCACCT 
41101 CCTCCACCGC CGACGGGTCC CCCGCCACCA CCGTCGAAGC CGGACCATTA CGCGCCGCGA 
41161 TCCACACACC CTCGACCAGA CCCACCTCAC CGGCCGGCAA CGCCACCGAA GCCATCGCCC 
41221 CCCGGCCGGC CAGCCGCGCC GCGATCACCC GACTGCGCAA CGCCACCACG CGGGCGGCGT 
41281 CCTCCAGGCT GAGGGCTCCG GCCACACACG CCGCCGCGAT CTCCCCCTGC GAGTGTCCGA 
41341 CCACAGCGTC CGGCACGACC CCATGCGCCT GCCACAGCGC GGCCAGGCTC ACCGCGACCG 
414 01 CCCAGCTGGC CGGCTGGACC ACCTCCACCC GCTCCGCCAC ATCCGACCGC GACAACATCT 
414 61 CCCGCACATC CCAGCCCGTG TGCGGCAACA ACGCCCGCGC ACACTCCTCC ATACGAGCCG 
41521 CGAACACCGC GGAACGGTCC ATGAGTTCCA CGCCCATGCC CACCCACTGG GCACCCTGCC 
41581 CGGGGAAGAC GAACACCGTA ■ CGCGGCTGAT CCACCGCCAC ACCCATCACC CGGGCATCAC 
41641 CCAGCAGCAC CGCACGGTGA CCGAAGACAG CACGCTCACG CACCAACCCC TGCGCGACCG 
41701 CGGCCACATC CACCCCACCC CCGCGCAGAT ACCCCTCCAG CCGCTCCACC TGCCCCCGCA 
417 61 GACTCACCTC ACCACGAGCC GACACCGGCA ACGGCACCAA CCCATCACCA CCCGACTCCA 
41821 CACGCGACGG CCCAGGAACA CCCTCCAGGA TCACGTGCGC GTTCGTACCG CTCACCCCGA 
41881 ACGACGACAC ACCCGCATGC GGTGCCCGAT CCGACTCGGG CCACGGCCTC GCCTCGGTGA 
41941 GCAGCTCCAC CGCACCGGCC GACCAGTCCA CATGCGACGA CGGCTCGTCC ACGTGCAGCG 
4 2001 TCTTCGGCGC GATCCCATGC CGCATCGCCA TGACCATCTT GATGACACCG GCGACACCCG 
4 2061 CAGCCGCCTG CGCATGACCG ATGTTCGACT TGACCGAACC GAGGTAGAGC GGCGTGTCGC 
4 2121 GG7CCTGCCC GTAGGCCGCG AGGACGGCCT GCGCCTCGAT CGGGTCGCCC AGCCGCGTGC 
42181 CGGTGCCGTG CGCCTCCACC ACGTCCACAT CGGCGGCGCG CAGTCCGGCG TTGACCAACG 
42241 CCTGCCGGAT CACGCGCTGC TGGGCGACGC CGTTGGGGGC GGACAGTCCG TTGGAGGCAC 
42301 CGTCCTGGTT CACCGCCGAG CCGCGGACGA CCGCGAGAAC GGTGTGCCCG TTGCGCTCGG 
42361 CGTCGGAGAG CCGCTCCAGC ACGAGAACGC CGACGCCCTC GGCGAAGCCG GTCCCGTCCG 
4 2421 CCGCGTCGGC GAACGCCTTG CACCGTCCGT CCGGGGAGAG TCCGCGCTGC CGGGAGAACT 
42481 CCACGAGCTC TGCGGTGTTC GCCATGACGG TGACACCGCC GACCAGCGCC AGGGAGCACT 
4 2541 CCCCGGCCCG CAGTGCCTGT ' GCCGCCTGGT GCAGGGCGAC CAGCGACGAC GAGCACGCCG 
4 2601 TGTCGACCGT GACCGCCGGG CCCTGAAGTC CGTACACGTA-' CGAGAGGCGC CCGGACAGGA 
4 2661 CGCTCGTCTG CGTCGCCGTG ACACCGAGCC CGCCCAGGTC CCGGCCGACG CCGTAGCCCT 
4 2721 GGTTGAACGC GCCCATGAAC ACGCCGGTGT CGCTCTCCCG GAGCCTGTCC GGCACGATGC 
4 2781 CGGCGTTCTC GAACGCCTCC CAGGAGGTCT CCAGGATCAG GCGCTGCTGG GGGTCCATCG 
4 2 841 CCAGCGCCTC GTTCGGACTG ATGCCGAAGA ACGCGGCGTC GAACCCGGCG CCGGCCAGGA 
4 2 901 ATCCGCCGTG GCGTGTCGTG GAGCGGCCGG CCGCGTCCGG GTCCGGGTCG TACAGCGCGT 
4 2961 CGACGTCCCA GCCCCGGTCG GTGGGGAACT CGGTGATCGG CTCGGTACCG GCGGCGACGA 
4 3021 GCCGCCACAG GTCCTCCGGC GAGGCGACCC CGCCGGGCAG TCGGCACGCC ATGCCGACGA 
4 3081 TCGCGACGGG GTCGCCGGAG CCGAGGGTCT GGGCGGTCGC GGGTGCCGCT GTCGCGGAGC 
4 3141 CGGCGAGGTG GGCGGCGAAC GCACGCGGAG TGGGGTGGTC GAACGCGGTT GACGCGGGCA 
4 3201 CCCGCAGACC CGTCCGCGCG GCGACGGTGT TGGTGAACTC GACGGTGGTG AGCGAGTCGA 
4 3261 GGCCGTTCTC GCGGAACGTG CGGTCCGGGG AGCAGTGTCC GGCGCCCGGC AGGCCCAGGA 
4 3321 CGGTGGCGAC GCTGTCGCGG ACCAGGTCGA GCAGTACGTC CTCCCGGCCC GCACGGGCCG 
4 3381 CGGCGAGGCG GTTCGCCCAC TCCTGTTCCG TGGCGTCGGG CTCGGCCGGT CCGGTCAGTG 
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CGGTGAGGAT 
TCCGGGCCAC 
GCGCCGGCCG 
CCCGTGGCCG 
CGCCGGGGTT 
GGAGCAGGCC 
CGATCGGAGG 
CGAACGCGTC 
CGCGGTCGAA 
CGGCCAGGTC 
ACCGGCCGCC 
TGAGCACGAC 
CATGGTCGGT 
CGTACACCTC 
TCGCGGCGTG 
ACCAGGCGGT 
GGATCCGTGC 
GACCGAACAC 
TGCCCGCGGC 
CGTCGCGGAA 
GCGCGGCGGG 
GCGCAGCGCC 
CGTAGGCCAC 
CGAGGTCGTC 
GGCGCAGCGC 
CGCCCACCGC 
GCCGCTCCCA 
CCGGCAGCCC 
TGACGTGCCA 
GGATCGCCTC 
CGAGGACGGG 
CGACGGTCTC 
CCCGGCCGGT 
ACCAGCCGTC 
GGCTCGGCCC 
CCGGGTCGAC 
GCGCATCCTC 
CGAGCAGGGG 
ATCCGGCGAC 
GGAGGTAGCG 
CGTCGAGGAC 
GGACGGCGAG 
GTTCGTCGTC 
CGCTGCGCTG 
TCCAGGCGGG 
CGAGGTCCTC 
CGGTGCCGGT 
CGGAGTCCGT 
CGACGGCGGC 
GCAGCATCGC 
GGCCGGCCCG 
TCCGGTCGCC 
CCACACGCGC 
ACGAGTAGAC 
CTACCGTGGC 
AATTGCCTTC 
TGTCACGGCG 
GACGGTGCTC 
TGCTCCCCGG 
GCACGCACAG 



CGGCGGCGTG GCGCCCGCCA TCGTCGCGGC CCGCGCCCCG GCGGAACCGG 
GATGTACGAG CCGCCGCCCG CGATGGCCTT CTCGATCAGG TCGCCGG^GA 
TTCGATGCCG GGCAGCGCGC GGACGGTGAC GGTGGGGAGT CCCTCCGCGG 
GGTGTGGGCG TCGGCGCCGG CCGGGCCGTC GAGCAGGACG TGCACGAGCG 
CGCGGCTTCC TCGGCTGCGG TGGTCACGTG GGTGAGGCCG GTCTCGTCGC 
GGCGACGGTG TCGGCGTCCT CCCCGGTGAC CAGGACCGGC GCGTCCGGGC 
CGGCACGGTG AGGACCATCT TGCCGGTGTG CCGGGCGTGG CTCATCCACG 
CCGCGCACGG CGGATGTCCC ACGGCTGCAC CGGCAGCGGG CACAGCTCAC 
CAGGTCGAGG AGCAGTTCGA GGATCTCCCG CAGGCGCGCG GGATCCACGT 
GAACGGCTGC TGGGCGGCGT GGCGGATGTC GGTCTTGCCC ATCTCGACGA 
CGGTGCGAGC AGGCCGATGG ACGCGTCGAG GAGTTCACCG GTGAGCGAGT 
GTCGACCGGC GGGAAGGTGT CGGCGAACGC GGCGCTGCGG GAGTTCGCCA 
GTCGAAGCCG TCGGCGTGCA GCAGGTGTTG TTTGGCGGGA CTGGCGGTGG 
GGCGCCGAGG TGGCGGGCGA TCCGGGTCGC CGCCAT<3CCG ACACCGCCCG 
GACCAGGACC TTCTGGCCGG GTCGCAGCTC GCCCGCGTCG ACGAGGCCGT 
GGCGAACACG ATGGGCACGG ACGCGGCGAT GGGGAACGAC CATCCCCGTG 
GACCAGCCGC CGGTCCGCGA CCACGCTGCG CCGGAACGCG TCCTGCACGA 
GCGGTCGCCG GGGGCCAGGT CGTCGACGCC GGGTCCGACT TCGGTCACGA 
CTCCCCGCCC ATCTCGCCCT CGCCCGGGTA GGTGCCGAGC GCGATCAGCA 
GTTCAGCCCC GCGGCGCGGA CGTCGATGCG GACCTCGCCG GCGGCCAGGG 
ACGTCGAGCG GGGCGACGAC GAGGTCGCGG AGCGTTCCGG AGGCGGGCGG 
CACTGGCGCG GTCGGCAGGG GGGTGGTGTC CGCGCGTACC AGCOGGGGCA 
GCCGGCCCGC AGCGCGATCT GGGGTTCGCC GAGCGAGGCC GCGGCGGGGA 
ATCGCCGTCC GTGTCCACCA GCACGAACGA TCCGGGTTCG GCGGCCTGGC 
CTCGTCCCAG AGCCGGGCCT GGTCCGCGTC CGGGATCTCG GCCGGGCCGA 
GCGGCGGGTG ACGACCGTCC GGCGGGGTGA CGGGGTGCCG GGCAGGTCGC 
GACCAGTTCG CACAGCGTGG CCTCGCCACT GCCGGTGGCG ACCAGATGGG 
CGCGAGCCGC GCGCGCTGGA CCTTGCCCGA CGCGGTGCGG GGGATCGTGG 
GATCTCGTCG GGCACCTTGA AGTAGGCGAG CCGGCGGCGG CACTCGGCGA 
GGCGGGGACG CGGGGGCCGT CGGAAACGAC GTAGAGCACG GGTATGTCGC 
GTGCGGGCGG CCCGCCGCGG CGGCGTCCCG GACACCGGCC ACCTCCTGGG 
GATCTCCCGG GGGTGGATGT TCTCCCCGCC GCGGATGATC AGCTCCTTGA 
GATCGTCACG TGTCCGGTCT CGGCCTGACG TGCGAGGTCC CCGGTGCGGT 
. CACGAGCACC TGGGCGGTCG CCTCCGGCTG GGCGTGGTAG CCGAGCATGA 
GCTCGCCCAC AGCTCGCCCT CCTCGCCGGG TGCCACGTCG GCGCCGGACA 
GAACCGCAGC GACAGGCCCG GCACGGGCAG CCCGCACGAG CCGGGAACCC 
CAGGGTGTTG GCGGTGAGCG AGCCGGTCGT CTCGGTGCAG CCGTACGTGT 
CACGCCGAAC GTCGCCTCGA AATCCCTGGT GAGCGACGCC GGCGAGGTGG 
CAGCGCCACG CGCAGCGCGC GAGCCCGCGG CTCGCCGGAC ACGGCGCCGA 
GTACATCGTC GGCACGCCGA CGAGCACGGT GCTGGAGTGT TCGGCCAGGG 
GTCACGCGCG ACGAAGCCGC "CCAGGATACG GGCGGACGCG CCGACCGTGA 
CAGGCAGAGG TGGTGGCCGA GGCTGTGGAA CAGCGGGGCG GGCCAGAGCA 
CTCGGTCAGC CGCCAGGACG GCACGTCGCA GTGCATCGCG GACCACAGGC 
TGCGGAAACC ACGCCCTTGG GACGGCCGGT GGTGCCGGAG GTGTAGAGCA 
TTCGTCCAGG CCGAGGTCGT CGCGGGGCGG GCACGGCGGC TCGGTCCCGG 
GTAGGAGACG CAGTCCGGTG CCCGGCGCCC GACGAGCACG ACGGTGGCGT 
GCGGCGCACC TGGTCGAGGT GGGTTTCGTC GGTGACCAGC ACGGTCGCGC 
CAGGAAGTGG GCGAGTTCGG CGTCGGCGGC GTCCGGGTTG AGCGGGACGG 
GGCGCGGGCG GCGGCGAGGT AGACCTCGAT GGTCTCGATC CGGTTGCCGA 
GACCCGGTCG CCGCGGTCGA CGCCGGACGC GGCGAGGTGT CCGGCGAGCC 
GAGCCGGAGT TGCGTGTACG TCACGGCGCG TTGGGAATCC GTGTAGGCGA 
GCGTCGCTCG GCATGGATGC GGAGCAATTC GTGCAACGGC CGGATTGGTT 
CATGGAAACA CCTTTCTCTC GACCAACCGC ACAACAGCAC GGAACCGGCC 
GCCGGCGACG CTAGCAGCGT TTTCCGGACC GCCACCCCCT GAAGATCCCG 
CGGCCTCCCC GGACGCTCAT CTAGGGGGTT GCACGCATAC CGCCGTGCGT 
CTGATGACCG ATGCCGGACG CCAGGGAAGG GTGGAGGCGT TGTCCATATC 
CCGTATTGCC GCTTCGAGAA GACCGGATCA CCGGACCTCG AGGGTGACGA 
GGCCTGATCG AGCACGGCAC CGGCCACACC GACG7GTCGC TGGTGGACGG 
ACCGCCGTGC ACACCACGAC CCGTGACGAC GAGGCGTTCA CCGAGGTCTG 
CGCCCTGTCG AGTCCGGCAT GGACAACGGC ATCGCCTGGG CCCGCACCGA 
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4 7 041 CGCGT.-.CCTG TTCGGTGTCG TGCGCACCGG CGAGAGCGGC AGGTACGCCG ATGCCACCGC 
4 7101 GGCCCTCTAC ACGAACGTCT TCCAGCTCAC CCGGTCGCTG GGGTATCCCC TGCTCGCCCG 
47161 GACCTGGAAC TACGTCAGCG GTATCAACAC GACGAACGCG GACGGGCTGG AGGTGTACCG 
4 7 221 GGAC7TCTGC GTGGGCCGCG CCCAGGCGCT CGACGAGGGC GGGATCGACC CGGCCACCAT 
5 4 7281 GCCCGCGGCC ACCGGTATCG GCGCCCACGG GGGCGGCATC ACCTGCGTGT TCCTCGCCGC 
4 7 341 CCGGGGCGGA GTGCGGATCA ACATCGAGAA CCCCGCCGTC CTCACGGCCC ACCACTACCC 
47 4 01 GACGACGTAC GGTCCGCGGC CCCCGGTCTT CGCACGGGCC ACCTGGCTGG GCCCGCCGGA 
4 74 61 GGGGGGCCGG CTGTTCATCT CCGCGACGGC CGGCATCCTC GGACACCGAA CGGTGCACCA 
4 7 521 CGGTGATGTG ACCGGCCAGT GCGAGGTCGC CCTCGACAAC ATGGCCCGGG TCATCGGCGC 
10 4 7 581 GGAGAACCTG CGGCGCCACG GCGTCCAGCG GGGGCACGTC CTCGCCGACG TGGACCACCT 
4 7 641 CAAGGTCTAC GTCCGCCGCC GCGAGGATCT CGATACGGTC CGCCGGGTCT GCGCCGCACG 
4 7701 CCTGTCGAGC ACCGCGGCCG TCGCCCTTTT GCACACCGAC ATAGCCCGCG AGGATCTGCT 
4 7761 CGTCGAAATC GAAGGCATGG TGGCGTGACA ATACCCGGTA AAAGGCCCGC GACGCTGCGC 
4 7 821 CTCGGCGGAT CCGCGAAGAG AAAGAAGAGC GTCACCGCAC AGCGCGGCAG CCCGGTCCTT 
15 4 7881 TCGTCCTTCG CACAGCGGCG GATCTGGTTT CTCCAGCAAT TGGACCCGGA GAGCAACGCC 
4 7 941 TATAATCTCC CGCTCGTGCA ACGCCTGCGC GGTCTATTGG ACGCGCCGGC CCTGGAGCGT 
4 8001 GCGCTGGCGC TCGTCGTCGC GCGCCACGAG GCGTTGCGGA CGGTGTTCGA CACCGCCGAC 
4 8061 GGCGAGCCCC TCCAGCGGGT GCTTCCCGCC CCGGAACACC TCCTGCGCCA CGCGCGGGCG 
4 8121 GGCAGCGAGG AGGACGCCGC CCGGCTCGTC CGCGACGAGA TCGCCGCGCC GTTCGACCTC 

20 ~ 4 8181 GCCACCGGGC CGTTGATCAG GGCCCTGCTG ATCCGCCTCG GTGACGACGA CCACGTTCTC 
4 8 241 GCGGTGACCG TGCACCATGT CGCCGGCGAC GGCTGGTCGT TCGGGCTCCT CCAACATGAA 
4 8301 CTCGCAGCCC ACTACACGGC GCTGCGCGAC ACTGCCCGCC CTGCCGAACT GCCGCCGTTG 
4 83 61 CCGGTGCAGT ACGCCGACTT CGCCGCCTGG GAGCGGCGCG AACTCACCGG CGCCGGACTG 
4 8 421 GACAGGCGTC TGGCCTACTG GCGCGAGCAA CTCCGGGGCG CCCCGGCGCG GCTCGCCCTC 

25 4 8 481 CCCACCGACC GTCCCCGCCC GCCGGTCGCC GACGCGGACG CGGGCATGGC CGAGTGGCGG 
4 8 541 CCGCCGGCCG CGCTGGCCAC CGCGGTCCTC ACGCTCGCGC GCGACTCCGG TGCGTCCGTG 
4 8 601 TTCATGACCC TGCTGGCGGC CTTCCAAGCG GTCCTCGCCC GGCAGGCGGG CACGCGGGAC 
4 8 661 GTGCTGGTCG GCACGCCCGT GGCGAACCGT ACGCGGGCGG CGTACGAGGG CCTGATCGGC 
4 8721 ATGTTCGTCA ACACGCTCGC GCTGCGCGGC GACCTCTCGG GCGATCCGTC GTTCCGGGAA 

30 4 8781 CTCCTCGACC GCTGCCGGGC CACGACCACG GACGCGTTCG CCCACGCCGA CCTGCCGTTC 
4 8 841 GAGAACGTCA TCGAACTCGT CGCACCGGAA CGCGACCTGT CGGTCAACCC GGTCGTCCAG 
4 8 901 GTGCTGTTGC AGGTGCTGCG GCGCGACGCG GCGACGGCCG CGCTGCCCGG CATCGCGGCC 
4 8 961 GAACCGTTCC GCACCGGACG CTGGTTCACC CGCTTCGACC TCGAATTCCA TGTGTACGAG 
4 9021 GAGCCGGGTG GCGCGCTGAC CGGCGAACTG CTCTACAGCC GTGCGCTGTT CGACGAGCCA 

35 4 9081 CGGATCACGG GGTTGCTGGA GGAGTTCACG GCGGTGCTTC AGGCGGTCAC CGCCGACCCG 
4 9141 GACGTACGGC TGTCGCGGCT GCCGGCCGGC GACGCGACGG CGGCAGCGCC CGTGGTGCCC 
4 9201 TCGAACGACA CGGCGCGGGA CCTGCCCGTC GACACGCTGC CGGGCCTGCT GGCCCGGTAC 
4 9261 GCCGCACGCA CCCCCGGCGC CGTGGCCGTC ACCGACCCGC ACATCTCCCT CACCTACGCG 
4 9321 CAGCTGGACC GGCGGGCGAA CCGCCTCGCG CACCTGCTCC GCGCGCGCGG CACCGCCACC 

40 4 9381 GGCGACCTGG TCGGGATCTG CGCCGATCGC GGCGCCGACC TGATCGTCGG CATCGTGGGG 
4 9441 ATCCTCAAGG CGGGCGCCGC TTATGTGCCG CTGGACCCCG AACATCCTCC GGAGCGCACG 
4 9501 GCGTTCGTGC TGGCCGACGC GCAGCTGACC ACGGTGGTGG CGCACGAGGT CTACCGTTCC 
4 9561 CGGTTCCCCG ATGTGCCGCA CGTGGTGGCG TTGGACGACC CGGAGCTGGA CCGGCAGCCG 
4 9621 GACGACACGG CGCCGGACGT CGAGCTGGAC CGGGACAGCC TCGCCTACGC GATCTACACG 

45 4 9681 TCCGGGTCGA CCGGCAGGCC GAAGGCCGTG CTCATGCCGG GTGTCAGCGC CGTCAACCTG 
4 9741 CTGCTCTGGC AGGAGCGCAC GATGGGCCGC GAGCCGGCCA GCCGCACCGT CCAGTTCGTG 
4 9801 ACGCCCACGT TCGACTACTC GGTGCAGGAG ATCTTTTCCG CGCTGCTGGG GGGCACGCTC 
4 98 61 GTCATCCCGC CGGACGAGGT GCGGTTCGAC CCGCCGGGAC TCGCCCGGTG GATGGACGAA 
4 9921 CAGGCGATTA CCCGGATCTA CGCGCCGACG GCCGTACTGC GCGCGCTGAT CGAGCACGTC 

50 4 9981 GATCCGCACA GCGACCAGCT CGCCGCCCTG CGGCACCTGT GCCAGGGCGG CGAGGCGCTG 
50041 ATCCTCGACG CGCGGTTGCG CGAGCTGTGC CGGCACCGGC CCCACCTGCG CGTGCACAAT 
50101 CACTACGGTC CGGCCGAAAG CCAGCTCATC ACCGGGTACA CGCTGCCCGC CGACCCCGAC 
50161 GCGTGGCCCG CCACCGCACC GATCGGCCCG CCGATCGACA ACACCCGCAT CCATCTGCTC 
50221 GACGAGGCGA TGCGGCCGGT TCCGGACGGT ATGCCGGGGC AGCTCTGCGT CGCCGGCGTC 

55 50281 GGCCTCGCCC GTGGGTACCT GGCCCGTCCC GAGCTGACCG CCGAGCGCTG GGTGCCGGGA 
50341 GATGCGGTCG GCGAGGAGCG CATGTACCTC ACCGGCGACC TGGCCCGCCG CGCGCCCGAC 
50401 GGCGACCTGG AATTCCTCGG CCGGATCGAC GACCAGGTCA AGATCCGCGG CATCCGCGTC 
504 61 GAACCGGG7G AGATCGAGAG CCTGCTCGCC GAGGACGCCC GCGTCACGCA GGCGGCGGTG 
50521 TCCGTGCGCG AGGACCGGCG GGGCGAGAAG TTCCTGGCCG CGTACGTCGT ACCGGTGGCC 

60 50581 GGCCGGCACG GCGACGACTT CGCCGCGTCG CTGCGCGCGG GAC7GGCCGC CCGGCTGCCC 
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50641 SCCGCGCTCG TGCCCTCCGC CGTCG7CCTG GTGGAGCGAC TGCCGAGGAC CACGAGCGGC 
507 01 AAGGTGGACC GGCGCGCGCT GCCCGACCCG GAGCCGGGCC CGGCGTCGAC CGGGGCGGTT 
507 61 ACGCCCCGCA CCGATGCCGA GCGGAC337G TGCCGGATCT TCCAGGAGGT GCTCGACGTC 
50821 CCGCGGGTCG GTGCCGACGA CGAC7777TC ACGCTCGGCG GGCACTCCCT GCTCGCCACC 
50881 CGGGTCGTCT CCCGCATCCG CGCCGAGC7G GGTGGCGATG TCCCGCTGCG TACGCTCT^C 
50941 3ACGGGCGGA CGCCCGCCGC GCTCGCCCTGT GCGGCGGACG AGGCCGGCCC GGCCGCCCTG 
51001 CCCCCGATCG CGCCCTCCGC GGAGAACGGG CCGGCCCCCC TCACCGCGGC ACAGGAACAG 
51061 ATGCTGCACT CGCACGGCTC GCTGC7CGCC GCGCCCTCCT ACACGGTCGC CCCGTACGGG 
51121 TTCCGGCTGC GCGGGCCACT CGACCGCGAA GCGCTCGACG CGGCACTGAC CCGGATCGCC 
51181 GCGCGCCACG AGCCGCTGCG GACCGGG77C CGCGATCGGG AACAGGTCGT CCGGCCGCCC 
51241 GCTCCGGTGC GCGCCGAGGT GGTTCC3GTG CCGGTCGGCG ACGTCGACGC CGCGGTCCGG 
51301 GTCGCCCACC GGGAGCTGAC CCGGCCGTTC GACCTCGTGA ACGGGTCGTT GCTGCGTGCC 
51361 GTGCTGCTGC CGCTGGGCGC CGAGGATCAC GTGCTGCTGC TGATGCTGCA CCACCTCGCC 
51421 GGTGACGGAT GGTCCTTCGA CCTCCTGGTC CGGGAGTTGT CGGGGACGCA ACCGGACCTT 
51481 CCGGTGTCCT ACACGGACGT GGCCCGGTGG GAACGGAGTC- CGGCCGTGAT CGCGGCCAGG 
51541 GAGAACGACC GGGCCTACTG GCGCCGGCGG CTGGGGGGCG "CCACCGCGCC ' GGAGCTGCCC 
51601 GCGGTCCGGC CCGGCGGGGC ACCGACCGGG CGGGCGTTCC TGTGGACGCT CAAGGACACC 
51661 GCCGTCCTGG CGGCACGCCG GGTCGCGGAC GCCCACGACG CGACGTTGCA CGAAACCGTG 

517 21 CTCGGCGCCT TCGCCCTGGT CGTGGCG3AG ACCGCCGACA CCGACGACGT GCTCGTCGCG 
51781 ACGCCGTTCG CGGACCGGGG GTACGCCGGG ACCGACCACC TCATCGGCTT CTTCGCGAAG 

518 41 GTCCTCGCGC TGCGCCTCGA CCTCGGCGGC ACGCCGTCGT TCCCCGAGGT GCTGCGCCGG 
51901 GTGCACACCG CGATGGTGGG CGCGCAC3CC CACCAGGCGG TGCCCTACTC CGCGCTGCGC 
51961 GCCGAGGACC CCGCGCTGCC GCCGGCCCCC GTGTCGTTCC AGCTCATCAG " CGCGCTCAGC 
52021 GCGGAACTGC GGCTGCCCGG CATGCACACC GAGCCGTTCC CCGTCGTCGC CGAGACCGTC 
52081 GACGAGATGA CCGGCGAACT GTCGATCAAC CTCTTCGACG ACGGTCGCAC CGTCTCCGGC 
52141 GCGGTGGTCC ACGATGCCGC GCTGC7CGAC CGTGCCACCG TCGACGATTT GCTCACCCGG 
52201 G7GGAGGCGA CGCTGCGTGC CGCCGCGGGC GACCTCACCG TACGCGTCAC CGGTTACGTG 
522 61 GAAAGCGAGT AGCCATGCCC GAGCAGGACA AGACAGTCGA GTACCTTCGC TGGGCGACCG 
52321 CGGAACTCCA GAAGACCCGT GCGGAAC7CG CCGCGCACAG CGAGCCGTTG GCGATCGTGG 
52 381 GGATGGCCTG CCGGCTGCCC GGCGGGGTCG CGTCGCCGGA GGACCTGTGG CAG'TTGCTGG 
524 41 AGTCCGGTGG CGACGGCATC ACCGCGTTCC CCACGGACCG GGGCTGGGAG ACCACCGCCG 
52501 ACGGTCGCGG CGGCTTCCTC ACCGGGGCGG CCGGCTTCGA CGCGGCGTTC TTCGGCATCA 
52561 GCCCGCGCGA GGCGCTGGCG ATGGACCCGC . AGCAGCGCCT GGCCCTGGAG ACCTCGTGGG 
52621 AGGCGTTCGA GCACGCGGGC ATCGATCCGC AGACGCTGCG GGGCAGTGAC ' ACGGGGGTGT 
52681 TCCTCGGCGC GTTCTTCCAG GGGTACGGCA TCGGCGCCGA CTTCGACGGT TACGGCACCA 
52741 CGAGCATTCA CACGAGCGTG CTCTCCGGCC GCCTCGCGTA CTTCTACGGT ■ CTGGAGGGTC ' 
52801 CGGCGGTCAC GGTCGACACG GCGTGTTCGT CGTCGCTGGT GGCGCTGCAC CAGGCCGGGC 
528 61 AGTCGCTGCG CTCCGGCGAA TGCTCGC7CG CCCTGGTCGG CGGCGTCACG GTGATGGCCT 
52921 CGCCGGCGGG GTTCGCGGAC TTCTCCGAGC AGGGCGGCCT GGCCCCCGAG GCGCGCTGCA 
52 981 AGGCCTTCGC GGAAGCGGCT GACGGCA7CG GTTTCGCCGA GGGGTCCGGC GTCCTGATCG 
53041 TCGAGAAGCT CTCCGACGCC GAGCGCAACG GCCACCGCGT GCTGGCGGTC GTCCGGGGTT 
53101 CCGCCGTCAA CCAGGACGGT ■ GCCTCCAACG GGGTGTCCGC GCCGAACGGG CCGTCGCAGG 
53161 AGCGGGTGAT CCGGCAGGCC CTGGCCAACG CCGGACTCAC CCCGGCGGAC GTGGACGCCG 
53221 TCGAGGCCCA CGGCACCGGC ACCAGGCTGG GCGACCCCAT CGAGGCACAG GCCGTGCTGG 
53281 CCACCTACGG GCAGGGGCGC GACACCCCTG TGCTGCTGGG CTCGCTGAAG TCCAACATCG 
53341' GCCACACCCA GGCCGCCGCG GGCGTCGCCG GTGTCATCAA GATGGTCCTC GCCATGCGGC 
534 01 ACGGCACCCT GCCCCGCACC 'CTGGACGTGG ACACGCCGTC CTCGCACGTC GACTGGACGG 
534 61 CCGGCGCCGT CGAACTCCTC ACCGACGCCC GGCCCTGGCC CGAAACCGAC CGCCCACGGC 
53521 GCGCCGGTGT CTCCTCCTTC GGCGTCAGCG GCACCAACGC CCACATCATC CTCGAAAGCC 
53581 ACCCCCGACC GGCCCCCGAA CCCGCCCCGG CACCCGACAC CGGACCGCTG CCGCTGCTGC 
53641 TCTCGGCCCG CACCCCGCAG GCACTCGACG CACAGGTACA CCGCCTGCGC GCGTTCCTCG 
53701 ACGACAACCC CGGCGCGGAC CGGGTCGCCG TCGCGCAGAC ACTCGCCCGG CGCACCCAGT 
537 61 TCGAGCACCG CGCCGTGCTG CTCGGCGACA CGCTCATCAC' CGTGAGCCCG AACGCCGGCC 
53821 GCGGACCGGT GGTCTTCGTC TACTCGGGGC AAAGCACGCT GCACCCGCAC ACCGGGCGGC 
53881 'AACTCGCGTC CACCTACCCC GTGTTCGCCG AAGCGTGGCG CGAGGCCCTC GACCACCTCG 
53941 ACCCCAGCCA GGGCCCGGCC" ACGCACTTCG CCCACCAGAC CGCGCTCACC GCGCTCCTGC 
54 001 GGTCCTGGGG CATCACCCCG CACGCGGTCA TCGGCCACTC CCTCGGTGAG ATCACCGCCG 
54061 CGCACGCCGC CGGTGTCCTG TCCCTGAGGG ACGCGGGCGC GCTCCTCACC AGCCGCACCC 
54121 GCCTGATGGA CCAACTGCCG TCGGGCGGCG CGATGGTCAC CGTCCTGACC AG CG AGG AAA 
54181 AGGCACGCCA GGTGCTGCGG CCGGGCG7GG AGATCGCCGC CGTCAACGGC CCCCACTCCC 
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54 24 1 TCGTGCTGTC CGGGGACGAG GAAGCCGTAC TCGAAGCCGC CCGGCAGCTC GGCATCCACC 
54 301 ACCGCC7GCC GACCCGCCAC GCCGGCCACT CCGAGCGCAT GCAGCCACTC GTCGCCCCCC 
54 361 TCCTCGACGT CGCCCGGACC CTGACGTACC ACCAGCCCCA CACCGCCATC CCCGGCGACC 
54 421 CCACCACCGC CGAATACTGG GCGCACCAGG TCCGCGACCA AGTACGTTTC CAGGCGCACA 
54 481 CCGAGCAG7A CCCGGGCGCG ACGTTCCTCG AGATCGGCCC CAACCAGGAC C7CTCGCCGC 
54 541 TCGTCGACGG CGTTGCCGCC CAGACCGGTA CGCCCGACGA GGTGCGGGCG C7GCACACCG 
54 601 CGCTCGCGCA GCTCCACGTC CGCGGCGTCG CGATCGACTG GACGCTCGTC CTCGGCGGGG 
54 661 ACCGCGCGCC CGTCACGCTG CCCACGTATC CGTTCCAGCA CAAGGACTAC TGGCTGCGGC 
; 54 721 CCACCTCCCG GGCCGATGTG ACCGGCGCGG GGCAGGAGCA GGTGGCGCAC CCGCTGCTCG 
54 781 GCGCCGCGGT CGCGCTGCCC GGCACGGGCG GAGTCGTCCT GACCGGCCGC CTGTCGCTGG 
54 841 CC7CCCATCC GTGGCTCGGC GAGCACGCGG TCGACGGCAC CGTGCTCCTG CCCGGCGCGG 
54 901 CCTTCCTCGA ACTCGCGGCG CGCGCCGGCG ACGAGGTCGG CTGCGACCTG CTGCACGAAC 

54 961 TCGTCATCGA GACGCCGCTC GTGCTGCCCG CGACCGGCGG TGTGGCGGTC TCCGTCGAGA 
55021 TCGCCGAACC CGACGACACG GGGCGGCGGG CGGTCACCGT CCACGCGCGG GCCGACGGCT 
55081 CGGGCCTGTG GACCCGACAC GCCGGCGGAT TCCTCGGCAC GGCACCGGCA CCGGCCACGG 
55141 CCACGGACCC GGCACCCTGG CCGCCCGCGG AAGCCGGACC GGTCGACGTC GCCGACGTCT 
55201 ACGACCGGTT CGAGGACATC GGGTACTCCT ACGGACCGGG CTTCCGGGGG CTGCGGGCCG 
55261 CCTGGCGCGC CGGCGACACC GTGTACGCCG AGGTCGCGCT CCCCGACGAG CAGAGCGCCG 
55321 ACGCCGCCCG TTTCACGCTG CACCCCGCGC TGCTCGACGC CGCGTTCCAG GCCGGCGCGC 
5 5381 TGGCCGCGCT CGACGCACCC GGCGGGGCGG CCCGACTGCC GTTCTCGTTC CAGGACGTCC 
554 41 GCATCCACGC GGCCGGGGCG JKCGCGGCTGC GGGTCACGGT CGGCCGCGAC GGCGAGCGCA 
55501 GCACCGTCCG CATGACCGGC CCGGACGGGC AGCTGGTGGC CGTGGTCGGT GCCGTGCTGT 
55561 CGCGCCCGTA CGCGGAAGGC TCCGGTGACG GCCTGCTGCG CCCGGTCTGG ACCGAGCTGC 

55 621 CGATGCCCGT CCCGTCCGCG GACGATCCGC GCGTGGAGGT CCTCGGCGCC GACCCGGGCG 
5 5681 ACGGCGACGT TCCGGCGGCC ACCCGGGAGC TGACCGCCCG CGTCCTCGGC GCGCTCCAGC 
5 5741 GCCACCTGTC CGCCGCCGAG GACACCACCT TGGTGGTACG GACCGGCACC GGCCCGGCCG 
55801 CTGCCGCCGC CGCGGGTCTG GTCCGCTCGG CGCAGGCGGA GAACCCCGGC CGCGTCGTGC 
55861 TCGTCGAGGC GTCCCCGGAC ACCTCGGTGG AGCTGCTCGC CGCGTGCGCC GCGCTGGACG 

'55921 AACCGCAGCT GGCCGTCCGG GACGGCGTGC TCTTCGCGCC GCGGCTGGTC CGGATGTCCG 
55981 ACCCCGCGCA CGGCCCGCTG TCCCTGCCGG ACGGCGACTG GCTGCTCACC CGGTCCGCCT 

'56041 CCGGCACGTT GCACGACGTC GCGCTCATAG CCGACGACAC GCCCCGGCGG GCGCTCGAAG 
56101 CCGGCGAGGT CCGCATCGAC GTCCGCGCGG CCGGACTGAA CTTCCGCGAT GTGCTGATCG 
5 6161 CGCTCGGGAC GTACACCGGG GCCACGGCCA TGGGCGGCGA GGCCGCGGGC GTCGTGGTGG 
5 6221 AGACCGGGCC CGGCGTGGAC GACCTGTCCC CCGGCGACCG GGTGTTCGGC CTGACCCGGG 

.56281 GCGGCATCGG CCCGACGGCC GTCACCGACC GGCGCTGGCT GGCCCGGATC GCCGACGGCT 
56341 GGAGCTTCAC CACGGCGGCG TCCGTCCCGA TCGTGTTCGC GACCGCGTGG TACGGCCTGG 
564 01 TCGACCTCGG CACACTGCGC GCCGGCGAGA AGGTCCTCGT CCACGCGGCC ACCGGCGGTG 
5 64 61 TCGGCATGGC CGCCGCACAG ATCGCCCGCC ACCTGGGCGC CGAGCTCTAC GCCACCGCCA 
5 6521 GTACCGGCAA GCAGCACGTC CTGCGCGCCG CCGGGCTGCC CGACACGCAC ATCGCCGACT 
5 6581 CTCGGACGAC CGCGTTCCGG ACCGCTTTCC CGCGCATGGA CGTCGTCCTG AACGCGCTGA 
56641 CCGGCGAGTT CATCGACGCG TCGCTCGACC TGCTGGACGC CGACGGCCGG TTCGTCGAGA 
56701 TGGGCCGCAC CGAGCTGCGC GACCCGGCCG CGATCGTCCC CGCCTACCTG CCGTTCGACC 
56761 TGCTGGACGC GGGCGCCGAC CGCATCGGCG AGATCCTGGG CGAACTGCTC CGGCTGTTCG 
5 6821 ACGCGGGCGC GCTGGAGCCG CTGCCGGTCC GTGCCTGGGA CGTCCGGCAG GCACGCGACG 
5 6881 CGCTCGGCTG GATGAGCCGC GCCCGCCACA TCGGCAAGAA CGTCCTGACG CTGCCCCGGC 
5.6941 CGCTCGACCC GGAGGGCGCC GTCGTCCTCA CCGGCGGCTC CGGCACGCTC GCCGGCATCC 
57001 TCGCCCGCCA CCTGCGCGAA CGGCATGTCT ACCTGCTGTC CCGGACGGCA. CCGCCCGAGG 
57061 GGACGCCCGG CGTCCACCTG CCCTGCGACG TCGGTGACCG GGACCAGCTG GCGGCGGCCC 
57121 TGGAGCGGGT GGACCGGCCG ATCACCGCCG TGGTGCACCT CGCCGGTGCG CTGGACGACG 
57181 GCACCGTCGC GTCGCTCACC CCCGAGCGTT TCGACACGGT GCTGCGCCCG AAGGCCGACG 
57241 GCGCCTGGTA CCTGCACGAG CTGACGAAGG AGCAGGACCT CGCCGCGTTC GTGCTCTACT 
57301 CGTCGGCCGC CGGCGTGCTC GGCAACGCCG GCCAGGGCAA CTACGTCGCC GCGAACGCGT 
57361 TCCTCGACGC GCTCGCCGAG CTGCGCCACG GTTCCGGGCT GCCGGCCCTC TCCATCGCCT 
57 4 21 GGGGGCTCTG GGAGGACGTG AGCGGGCTCA CCGCGGCGCT CGGCGAAGCC GACCGGGACC 
57481 GGATGCGGCG CAGCGGTTTC CGGGCCATCA CCGCGCAACA GGGCATGCAC CTGTACGAGG 
57 541 CGGCCGGCCG CACCGGAAGT CCCGTGGTGG TCGCGGCGGC GCTCGACGAC GCGCCGGACG 
57 601 TGCCGCTGCT GCGCGGCCTG CGGCGGACGA CCGTCCGGCG GGCCGCCGTC CGGGAGTGTT 
57 661 CG7CCGCCGA CCGGCTCGCC GCGCTGACCG GCGACGAGCT CGCCGAAGCG CTGCTGACGC 
57721 TCGTCCGGGA GAGCACCGCC GCCGTGCTCG GCCACGTGGG TGGCGAGGAC ATCCCCGCGA 
57781 CGGGGGCGTT CAAGGACCTC GGCATCGACT CGCTCACCGC GGTCCAGCTG CGCAACGCCC 



SUBSTITUTE SHEET (RULE 26) 



WO 00/20601 



PCT/US99/22886 



41 

57841 TCACCGAGGC GACCGGTGTG CGGCTGAACG CCACGGCGGT CTTCGACTTC CCGACCCCGC 
57 901 ACGTGCTCGC CGGGAAGCTC GGCGACGAAC TGACCGGCAC CCGCGCGCCC GTCGTGCCCC 

57 961 GGACCGCGGC CACGGCCGGT GCGCACGACG AGCCGCTGGC GATCGTGGGA ATGGCCTGCC 
58021 GGCTGCCCGG CGGGGTCGCG TCACCCGAGG AGCTG7GGCA CCTCGTGGCA TCCGGCACCG 

5 58081 ACGCCATCAC GGAGTTCCCG ACGGACCGCG GCTGGGACGT CGACGCGATC TACGACCCGG 
58141 ACCCCGACGC GATCGGCAAG ACCTTCGTCC GGCACGGTGG CTTCCTCACC GGCGCGACAG 
58201 GCTTCGACGC GGCGTTCTTC GGCATCAGCC CGCGCGAGGC CCTCGCGATG GACCCGCAGC 
58261 AGCGGGTGC7 CCTGGAGACG TCGTGGGAGG CGTTCGAAAG CGCCGGCATC ACCCCGGACT 

58 321 CGACCCGCGG CAGCGACACC GGCGTGTTCG TCGGCGCCTT CTCCTACGGT TACGGCACCG 
10 58381 GTGCGGACAC CGACGGCTTC GGCGCGACCG GCTCGCAGAC CAGTGTGCTC TCCGGCCGGC 

584 41 TGTCGTACT7 CTACGGTCTG GAGGGTCCGG CGGTCACGGT CGACACGGCG TGTTCGTCGT 
58501 CGCTGGTGGC GCTGCACCAG GCCGGGCAGT CGCTGCGCTC CGGCGAATGC TCGCTCGCCC 
58561 TGGTCGGCGG CGTCACGGTG ATGGCGTCTC CCGGCGGCTT CGTGGAGTTC TCCCGGCAGC 
58 621 GCGGCCTCGC GCCGGACGGC CGGGCGAAGG CGTTCGGCGC GGGTGCGGAC GGCACGAGCT 
15 58681 TCGCCGAGGG TGCCGGTGTG CTGATCGTCG AGAGGCTCTC CGACGCCGAA CGCAACGGTC 
58741 ACACCGTCCT GGCGGTCGTC CGTGGTTCGG CGGTCAACCA GGATGGTGCC TCCAACGGGC 
58801 TGTCGGCGCC GAACGGGCCG TCGCAGGAGC GGGTGATCCG GCAGGCCCTG GCCAACGCCG 
58 8 61 GGCTCACCCC GGCGGACGTG GACGCCGTCG AGGCCCACGG CACCGGCACC AGGCTGGGCG 
58 921 ACCCCATCGA GGCACAGGCG GTACTGGCCA CCTACGGACA GGAGCGCGCC ACCCCCCTGC 
20 58 981 TGCTGGGCTC GCTGAAGTCC AACATCGGCC ACGCCCAGGC CGCGTCCGGC GTCGCCGGCA 
5 9041 TCATCAAGAT GGTGCAGGCC CTCCGGCACG GGGAGCTGCC GCCGACGCTG CACGCCGACG 
59101 AGCCGTCGCC GCACGTCGAC TGGACGGCCG GCGCCGTCGA ACTGCTGACG TCGGCCCGGC 
59161 CGTGGCCCGA GACCGACCGG CCACGGCGTG CCGCCGTCTC CTCGTTCGGG GTGAGCGGCA 
59221 CCAACGCCCA CGTCATCCTG GAGGCCGGAC CGGTAACGGA GACGCCCGCG GCATCGCCTT 
25 5 9281 CCGGTGACCT TCCCCTGCTG GTGTCGGCAC GCTCACCGGA AGCGCTCGAC GAGCAGATCC 
59341 GCCGACTGCG CGCCTACCTG GACACCACCC CGGACGTCGA CCGGGTGGCC GTGGCACAGA 
594 01 CGCTGGCCCG GCGCACACAC TTCGCCCACC GCGCCGTGCT GCTCGGTGAC ACCGTCATCA 
594 61 CCACACCCCC CGCGGACCGG CCCGACGAAC TCGTCTTCGT CTACTCCGGC CAGGGCACCC 
59521 AGCATCCCGC GATGGGCGAG CAGCTCGCCG CCGCCCATCC CGTGTTCGCC GACGCCTGGC 
30 59581 ATGAAGCGCT CCGCCGCCTT GACAACCCCG ACCCCCACGA CCCCACGCAC AGCCAGCATG 
59641 TGCTCTTCGC CCACCAGGCG GCGTTCACCG CCCTCCTGCG GTCCTGGGGC ATCACCCCGC 
59701 ACGCG GTCAT CGGCCACTCG CTGGGCGAGA TCACCGCGGC GCACGCCGCC GGCATCCTGT 
597 61 CGCTGGACGA CGCGTGCACC CTGATCACCA CGCGCGCCCG CCTCATGCAC ACGCTCCCGC 
59821 CACCCGGTGC CATGGTCACC GTACTGACCA GCGAAGAGAA GGCACGCCAG GCGTTGCGGC 
35 5 9881 CGGGCGTGGA GATCGCCGCC GTCAACGGGC CCCACTCCAT CGTGCTGTCC GGGGACGAGG 
59941 ACGCCGTGCT CACCGTCGCC GGGCAGCTCG GCATCCACCA CCGCCTGCCC GCCCCGCACG 
60001 CCGGGCACTC CGCGCACATG GAGCCCGTGG CCGCCGAGCT GCTCGCCACC ACCCGCGGGC 
60061 TCCGCTACCA CCCTCCCCAC ACCTCCATTC CGAACGACCC CACCACCGCT GAGTACTGGG 
60121 CCGAGCAGGT CCGCAAGCCC GTGCTGTTCC ACGCCCAGGC G C AG C AG T AC CCGGACGCCG 
40 60181 TGTTCGTGGA GATCGGCCCC GCCCAGGACC TCTCCCCGCT CGTCGACGGG ATCCCGCTGC 
60241 AGAACGGCAC CGCGGACGAG GTGCACGCGC TGCACACCGC GCTCGCGCAC CTCTACGCGC 
60301 GCGGTGCCAC GCTCGACTGG CCCCGCATCC TCGGGGCTGG GTCACGGCAC GACGCGGATG 
60361 TGCCCGCGTA CGCGTTCCAA CGGCGGCACT ACTGGATCGA GTCGGCACGC CCGGCCGCAT 
604 21 CCGACGCGGG CCACCCCGTG CTGGGCTCCG GTATCGCCCT CGCCGGGTCG CCGGGCCGGG 
45 60481 TGTTCACGGG TTCCGTGCCG ACCGGTGCGG ACCGCGCGGT GTTCGTCGCC GAGCTG<3CGC 
60541 TGG.CCGCCGC GGACGCGGTC GACTGCGCCA CGGTCGAGCG GCTCGACATC GCCTCCGTGC 
60601 CCGGCCGGCC GGGCCATGGC CGGACGACCG TACAG ACCTG GGTCGACGAG CCGGCGGACG 
60661* ACGGCCGGCG CCGGTTCACC GTGCACACCC GCACCGGCGA CGCCCCGTGG ACGCTGCACG 
60721 CCGAGGGGGT GCTGCGCCCC CATGGCACGG CCCTGCCCGA TGCGGCCGAC GCCGAGTGGC 
50 60781 CCCCACCGGG CGCGGTGCCC GCGGACGGGC TGCCGGGTGT GTGGCGCCGG GGGGACCAGG 
60841 TCTTCGCCGA GGCCGAGGTG GACGGACCGG ACGGTTTCGT GGTGCACCCC GACCTGCTCG 
60901 ACGCGGTCTT CTCCGCGGTC GGCGACGGAA .GCCGCCAGCC GGCCGGATGG CGCGACCTGA 
60961 CGGTGCACGC GTCGGACGCC ACCGTACTGC GCGCCTGCCT CACCCGGCGC ACCGACGGAG 
61021 CCATGGGATT CGCCGCCTTC GACGGCGCCG GCCTGCCGGT ACTCACCGCG GAGGCGGTGA 
55 61081 CGCTGCGGGA GGTGGCGTCA CCGTCCGGCT CCGAGGAGTC GGACGGCCTG CACCGGTTGG 
61141 AGTGiSCTCGC GGTCGCCGAG GCGGTCTACG ACGGTGACCT GCCCGAGGGA CATGTCCTGA 
61201 TCACCGCCGC CCACCCCGAC GACCCCGAGG ACATACCCAC CCGCGCCCAC ACCCGCGCCA 
612 61 CCCGCGTCCT GACCGCCCTG CAACACCACC TCACCACCAC CGACCACACC CTCATCGTCC 
61321 ACACCACCAC CGACCCCGCC GGCGCCACCG TCACCGGCCT CACCCGCACC GCCCAGAACG 
60 61381 AACACCCCCA CCGCATCCGC CTCATCGAAA CCGACCACCC CCACACCCCC CTCCCCCTGG 
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614 41 CCCAACTCGC CACCCTCGAC CACCCCCACC TCCGCGTCAC CCACCACACC CTCCACCACC 
61501 CCCACCTCAC CCCCCTCCAC ACCACCACCC CACCCACCAC CACCCCCCTC AACCCCGAAC 
61561 ACGCCATCAT CATCACCGGC GGCTCCGGCA CCCTCGCCGG CATCCTCGCC CGCCACCTGA 
61621 ACCACCCCCA CACCTACCTC CTCTCCCGCA CCCCACCCCC CGACGCCACC CCCGGCACCC 
5 61681 ACCTCCCCTG CGACGTCGGC GACCCCCACC AACTCGCCAC CACCCTCACC CACATCCCCC 

617 41 AACCCCTCAC CGCCATCTTC CACACCGCCG CCACCCTCGA CGACGGCATC CTCCACGCCC 

618 01 TCACCCCCGA CCGCCTCACC ACCGTCCTCC ACCCCAAAGC CAACGCCGCC TGGCACCTGC 
618 61 ACCACCTCAC CCAAAACCAA CCCCTCACCC ACTTCGTCCT CTACTCCAGC GCCGCCGCCG 
61921 TCCTCGGCAG tCCCGGACAA GGAAACTACG CCGCCGCCAA CGCCTTCCTC GACGCCCTCG 

10 61981 CCACCCACCG CCACACCCTC GGCCAACCCG CCACCTCCAT CGCCTGGGGC ATGTGGCACA 
62041 CCACCAGCAC CCTCACCGGA CAACTCGACG ACGCCGACCG GGACCGCATC CGCCGCGGCG 
62101 GTTTCCTCCC GATCACGGAC GACGAGGGCA TGCGCCTCTA CGAGGCGGCC GTCGGCTCCG 
62161 GCGAGGACTT CGTCATGGCC GCCGCGATGG ACCCGGCACA GCCGATGACC GGCTCCGTAC 
62221 CGCCCATCCT GAGCGGCCTG CGCAGGAGCG CGCGGCGCGT CGCCCGTGCC GGGCAGACGT 

15 62281 TCGCCCAGCG GCTCGCCGAG CTGCCCGACG CCGACCGCGG CGCGGCGCTG ACCACCCTCG 
62341 TCTCGGACGC CACGGCCGCC GTGCTCGGCC ACGCCGACGC CTCCGAGATC GCGCCGACCA 
624 01 CGACGTTCAA GGACCTCGGC ATCGACTCGC TCACCGCGAT CGAGCTGCGC AACCGGCTCG 
624 61 CGGAGGCGAC CGGGCTGCGG CTGAGTGCCA CGCTGGTGTT CGACCACCCG ACACCTCGGG 
62521 TCCTCGCCGC CAAGCTCCGC ACCGATCTGT TCGGCACGGC CGTGCCCACG CCCGCGCGGA 

20 62581 CGGCACGGAC CCACCACGAC GAGCCACTCG CGATCGTCGG CATGGCGTGC CGACTGCCCG 

62 641 GCGGGGTCGC CTCGCCGGAG GACCTGTGGC AGCTCGTGGC GTCCGGCACC GACGCGATCA 
627 01 CCGAGTTCCC CACCGACCGC GGCTGGGACA TCGACCGGCT GTTCGACCCG GACCCGGACG 
627 61 CCCCCGGCAA GACCTACGTC CGGCACGGCG GCTTCCTCGC CGAGGCCGCC GGCTTCGATG 
62821 CCGCGTTCTT CGGCATCAGC CCGCGCGAGG CACGGGCCAT GGACCCGCAG CAGCGCGTCA 

25 62881 TCCTCGAAAC CTCCTGGGAG GCGTTCGAGA ACGCGGGCAT CGTGCCGGAC ACGCTGCGCG 
62941 GCAGCGACAC CGGCGTGTTC ATGGGCGCGT TCTCCCATGG GTACGGCGCC GGCGTCGACC 
63001 TGGGCGGGTT CGGCGCCACC GCCACGCAGA ACAGCGTGCT CTCCGGCCGG TTGTCGTACT 
63061 TCTTCGGCAT GGAGGGCCCG GCCGTCACCG TCGACACCGC CTGCTCGTCG TCGCTGGTCG 
63121 CCCTGCACGA GGCGGCACAG GCGCTGCGGA CTGGAGAATG CTCGCTGGCG CTCGCCGGCG 

30 63181 GTGTCACGGT GATGCCCACC CCGCTGGGCT ACGTCGAGTT CTGCCGCCAG CGGGGACTCG 
63241 CCCCCGACGG CCGTTGCCAG GCCTTCGCGG AAGGCGCCGA CGGCACGAGC TTCTCGGAGG 
63301 GCGCCGGCGT TCTTGTGCTG GAGCGGCTCT CCGACGCCGA GCGCAACGGA CACACCGTCC 
63361 TCGCGGTCGT CCGCTCCTCC GCCGTCAACC AGGACGGCGC CTCCAACGGC ATCTCCGCAC 
63421 CCAACGGCCC CTCCCAGCAG CGCGTCATCC GCCAGGCCCT CGACAAGGCC GGGCTCGGCC 

35 634 81 CCGCCGACGT GGACGTGGTG GAGGCCCACG GCACCGGAAC CCCGCTGGGC GACCCGATCG 
63541 AGGCACAGGC CATCATCGCG ACCTACGGCC AGGACCGCGA CACACCGCTC TACCTCGGTT 
63601 CGGTCAAGTC GAACATCGGA CACACCCAGA CCACCGCCGG TGTCGCCGGC GTCATCAAGA 

63 661 TGGTCATGGC GATGCGCCAC GGCATCGCGC CGAAGACACT GCACGTGGAC GAGCCGTCGT 
63721 CGCATGTGGA CTGGACCGAG GGTGCGGTGG AACTGCTCAC CGAGGCGAGG CCGTGGCCCG 

40 63781 ACGCGGGACG CCCGCGCCGC GCGGGCGTGT CGTCGCTCGG TATCAGCGGT ACGAACGCCC 
63841 ACGTGATCCT TGAGGGTGTT CCCGGGCCGT CGCGTGTGGA GCCGTCTGTT GACGGGTTGG 
63901 TGCCGTTGCC GGTGTCGGCT CGGAGTGAGG CGAGTCTGCG GGGGCAGGTG GAGCGGCTGG 
63961 AGGGGTATCT GCGCGGGAGT GTGGATGTGG CCGCGGTCGC GCAGGGGTTG GTGCGTGAGC 

64 021 GTGCTGTCTT CGGTCACCGT GCGGTACTGC TGGGTGATGC CCGGGTGATG GGTGTGGCGG 
45 64 081 TGGATCAGCC GCGTACGGTG TTCGTCTTTC CCGGGCAGGG TGCTCAGTGG GTGGGCATGG 

64141 GTGTGGAGTT GATGGACCGT TCTGCGGTGT TCGCGGCTCG TATGGAGGAG TGTGCGCGGG 
64 201 CGTTGTTGCC GCACACGGGC TGGGATGTGC GGGAGATGTT GGCGCGGCCG GATGTGGCGG 
64 261 AGCGGGTGGA GGTGGTCCAG CCGGCCAGCT GGGCGGTCGC GGTCAGCCTG GCCGCACTGT 
64 321 GGCAGGCCCA CGGGGTCGTA CCCGACGCGG TGATCGGACA CTCCCAGGGC GAGATCGCGG 

50 64 381 CGGCGTGCGT GGCCGGGGCC CTCAGCCTTG AGGACGCCGC CCGCGTGGTG GCCTTGCGCA 
64 441 GCCAGGTCAT CGCGGCGCGA CTGGCCGGGC GGGGAGCGAT GGCTTCGGTG GCATTGCCGG 
64 501 CCGGTGAGGT CGGTCTGGTC GAGGGCGTGT GGATCGCGGC GCGTAACGGC CCCGCCTCGA 
64 561 CAGTCGTGGC CGGCGAGCCG TCGGCGGTGG AGGACGTGGT GACGCGGTAT GAGACCGAAG 
64 621 GCGTGCGAGT GCGTCGTATC GCCGTCGACT ACGCCTCCCA CACGCCCCAC GTGGAAGCCA 

55 64 681 TCGAGGACGA ACTCGCTGAG GTACTGAAGG GAGTTGCAGG GAAGGCCGCG TCGGTGGCGT 
64 741 GGTGGTCGAC CGTGGACAGC GCCTGGGTGA CCGAGCCGGT GGATGAGAGT TACTGGTACC 
64 801 GGAACCTGCG TCGCCCCGTC GCGCTGGACG CGGCGGTGGC GGAGCTGGAC GGGTCCGTGT 
64 8 61 TCGTGGAGTG CAGCGCCCAT CCGGTGCTGC TGCCGGCGAT GGAACAGGCC CACACGGTGG 
64 921 CGTCGTTGCG CACCGGTGAC GGCGGCTGGG AGCGATGGCT GACGGCGTTG GCGCAGGCGT 

60 64 981 GGACCCTGGG CGCGGCAGTG GACTGGGACA CGGTGGTCGA ACCGGTGCCA GGGCGGCTGC 
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65041 TCGATCTGCC CACCTACGCG TTCGAGCGCC GGCGCTACTG GCTGGAAGCG GCCGGTGCCA 
65101 CCGACCTGTC CGCGGCCGGG CTGACAGGGG CAGCACATCC CATGCTGGCC GCCATCACGG 
65161 CACTACCCGC CGACGACGGT GGTGTTGTTC TCACCGGCCG GATCTCGTTG CGCACGCATC 
65221 CCTGGCTGGC TGATCACGCG GTGCGGGGCA CGGTCCTGCT GCCGGGCACG GCCTTTGTGG 
652 81 AGCTGGTCAT CCGGGCCGGT GACGAGACCG GTTGCGGGAT AGTGGATGAA CTGGTCATCG 
: £5341 AATCCCCCCr CGTGGTGCCG GCGACCGCAG CCGTGGATCT GTCGGTGACC GTGGAAGGAG 
654 01 CTGACGAGGC CGGACGGCGG CGAGTGACCG TCCACGCCCG CACCGAAGGC ACCGGCAGCT 
654 61 GGACCCGGCA CGCCAGCGGC ACCCTGACCC CCGACACCCC CGACACCCCC AACGCTTCCG 
65521 GTGTTGTCGG TGCGGAGCCG TTCTCGCAGT GGCCACCTGC CACTGCCGCG GCCGTCGACA 
65581 CCTCGGAGTT CTACTTGCGC CTGGACGCGC TGGGCTACCG GTTCGGACCC ATGTTCCGCG 
65 641 GAATGCGGGC TGCCTGGCGT GATGGTGACA CCGTGTACGC CGAGGTCGCG CTCCCCGAGG 
657 01 ACCGTGCCGC CGACGCGGAC GGTTTCGGCA TGCACCCGGC GCTGCTCGAC GCGGCCTTGC 
657 61 AGAGCGGCAG CCTGCTCATG CTGGAATCGG ACGGCGAGCA GAGCGTGCAA CTGCCGTTCT 
65821 CCTGGCACGG CGTCCGGTTC CACGCGACGG GCGCGACCAT GCTGCGGGTG GCGGTCGTAC 
65881 CGGGCCCGGA CGGCCTCCGG CTGCATGCCG CGGACAGCGG GAACCGTCCC GTCGCGACGA 
65 941 TCGACGCGCT CGTGACCCGG TCCCCGGAAG CGGACCTCGC GCCCGCCGAT CCGATGCTGC 
66001 GGGTCGGGTG GGCCCCGGTG CCCGTACCTG CCGGGGCCGG TCCGTCCGAC GCGGACGTGC 
66061 TGACGCTGCG CGGCGACGAC GCCGACCCGC TCGGGGAGAC CCGGGACCTG ACCACCCGTG 
66121 TTCTCGACGC GCTGCTCCGG GCCGACCGGC CGGTGATCTT CCAGGTGACC GGTGGCCTCG 
66181 CCGCCAAGGC GGCCGCAGGC CTGGTCCGCA CCGCTCAGAA CGAGCAGCCC GGCCGCTTCT 
66241 TCCTCGTCGA AAC GGACCCG GGAGAGGTCC TGGACGGCGC GAAGCGCGAC GCGATCGCGG 
66301 CACTCGGCGA GCCCCATGTG CGGCTGCGCG ACGGCCTCTT CGAGGCAGCC CGGCTGATGC 
66361 GGGCCACGCC GTCCCTGACG CTCCCGGACA CCGGGTCGTG GCAGCTGCGG CCGTCCGCCA 
6 6421 CCGGTTCCCT CGACGACCTT GCCGTCGTCC CCACCGACGC CCCGGACCGG CCGCTCGCGG 
66481 CCGGCGAGGT GCGGATCGCG GTACGCGCGG CGGGCCTGAA CTTCCGGGAT GTCACGGTCG 
66541 CGCTCGGTGT GGTCGCCGAT GCGCGTCCGC TCGGCAGCGA GGCCGCGGGT GTCGTCCTGG 
66601 AGACCGGCCC CGGTGTGCAC GACCTGGCGC CCGGCGACCG GGTCCTGGGG ATGCTCGCGG 
66661 GCGCCTTCGG ACCGGTCGCG ATCACCGACC GGCGGCTGCT CGGCCGGATG CCGGACGGCT 
66721 GGACGTTCCC GCAGGCGGCG TCCGTGATGA CCGCGTTCGC GACCGCGTGG TACGGCCTGG 
66781 TCGACCTGGC CGGGCTGCGC CCCGGCGAGA AGGTCCTGAT CCACGCGGCG GCGACCGGTG 
66841 TCGGCGCGGC GGCCGTCCAG ATCGCGCGGC ATCTGGGCGC GGAGGTGTAC GCGACCACCA 
66901 GCGCCGCGAA GCGCCATCTG GTGGACCTGG ACGGAGCGCA TCTGGCCGAT TCCCGCAGCA 
66961 CCGCGTTCGC CGACGCGTTC CCGCCGGTCG ATGTCGTGCT CAACTCGCTC ACCGGTGAAT 
67021 TCCTCGACGC GTCCGTCGGC CTGCTCGCGG CGGGTGGCCG GTTCATCGAG ATGGGGAAGA 
67081 CGGACATCCG GCACGCCGTC CAGCAGCCGT TCGACCTGAT GGACGCCGGC CCCGACCGGA 
67141 TGCAGCGGAT CATCGTCGAG CTGCTCGGCC TGTTCGCGCG CGACGTGCTG CACCCGCTGC 
67201 CGGTCCACGC CTGGGACGTG CGGCAGGCGC GGGAGGCGTT CGGCTGGATG AGCAGCGGGC 
672 61 GTCACACCGG CAAGCTGGTG CTGACGGTCC CGCGGCCGCT GGATCCCGAG GGGGCCGTCG 
67321 TCATCACCGG CGGCTCCGGC ACCCTCGCCG GCATCCTCGC CCGCCACCTG GGCCACCCCC 
67381 AC ACCTACCT GCTCTCCCGC ACCCCACCCC CCGACACCAC CCCCGGCACC CACCTCCCCT 
67441 GCGACGTCGG CGACCCCCAC CAACTCGCCA CCACCCTCGC CCGCATCCCC CAACCCCTCA 
67 501 CCGCCGTCTT CCACACCGCC GGAACCCTCG ACGACGCCCT GCTCGACAAC CTCACCCCCG 
67561 ACCGCGTCGA CACCGTCCTC AAACCCAAGG CCGACGCCGC CTGGCACCTG CACCGGCTCA 
67 621 CCCGCGACAC CGACCTCGCC GCGTTCGTCG TCTACTCCGC GGTCGCCGGC CTCATGGGCA 
67 681 GCCCGGGGCA GGGCAACTAC GTCGCGGCGA ACGCGTTCCT CGACGCGCTC GCCGAACACC 
67741 GCCGTGCGCA AGGGCTGCCC GCGCAGTCCC TCGCATGGGG CATGTGGGCG GACGTCAGCG 
67801 CGCTCACCGC GAAACTCACC GACGCGGACC GCCAGCGCAT CCGGCGCAGC GGATTCCCGC 
678 61 CGTTGAGCGC CGCGGACGGC ATGCGGCTGT TCGACGCGGC GACGCGTACC CCGGAACCGG 
67 921 TCGTCGTCGC GACGACCGTC GACCTCACCC AGCTCGACGG CGCCGTCGCG CCGTTGCTCC 

67 981 GCGGTCTGGC CGCGCACCGG GCCGGGCCGG CGCGCACGGT CGCCCGCAAC GCCGGCGAAG 
68041 AGCCCCTGGC CGTGCGTCTT GCCGGGCGTA CCGCCGCCGA GCAGCGGCGC ATCATGCAGG 
68101 AGGTCGTGCT CCGCCACGCG GCCGCGGTCC TCGCGTACGG GCTGGGCGAC CGCGTGGCGG 
68161 CGGACCGTCC GTTCCGCGAG CTCGGTTTCG ATTCGCTGAC CGCGGTCGAC CTGCGCAATC 
68221 GGCTCGCGGC CGAGACGGGG CTGCGGCTGC CGACGACGCT GGTGTTCAGC CAcdcGACGG 
68281 CGGAGGCGCT CACCGCCCAC CTGCTCGACC TGATCGACGC TCCCACCGCC CGGATCGCCG 
68341 GGGAGTCCCT GCCCGCGGTG ACGGCCGCTC CCGTGGCGGC CGCGCGGGAC CAGGACGAGC 
684 01 CGATCGCCAT CGTGGCGATG GCGTGCCGGC TGCCCGGTGG TGTGACGTCG CCCGAGGACC 
684 61 TGTGGCGGCT CGTCGAGTCC GGCACCGACG CGATCACCAC -GCCTCCTGAC GACCGCGGCT 
68521 GGGACGTCGA CGCGCTGTAC GACGCGGACC CGGACGCGGC CGGCAAGGCG TACAACCTGC 

68 581 GGGGCGGTTA CCTGGCCGGG GCGGCGGAGT TCGACGCGGC GTTCTTCGAC ATCAGTCCGC 
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68 641 GCGAAGCGCT CGGCATGGAC CCGCAGCAAC GCCTGCTGCT CGAAACGGCG TGGGAGGCGA 
68701 TCGAGCGCGG CCGGATCAGT CCGGCGTCGC TCCGCGGCCG GGAGGTCGGC GTCTATGTCG 
687 61 GTGCGGCCGC GCAGGGCTAC GGGCTGGGCG CCGAGGACAC CGAGGGCCAC GCGATCACCG 
68 821 GTGGTTCCAC GAGCCTGCTG TCCGGACGGC TGGCGTACGT GCTCGGGCTG GAGGGCCCGG 
68881 CGGTGACCGT GGACACGGCG TGCTCGTCGT CTCTGGTCGC GCTGCATCTG GCGTGCCAGG 
68 941 GGCTGCGCCT GGGCGAGTGC GAACTCGCTC TGGCCGGAGG GGTCTCCGTA CTGAGTTCGC 
69001 CGGCCGCGTT CGTGGAGTTC TCCCGCCAGC GCGGGCTCGC GGCCGACGGG CGCTGCAAGT 
69061 CGTTCGGCGC GGGCGCGGAC GGCACGACGT GGTCCGAGGG CGTGGGCGTG CTCGTACTGG 
69121 AACGGCTCTC CGACGGCGAG CGGCTCGGGC ACACCGTGCT CGCCGTCGTC CGCGGCAGCG 
69181 CCGTCACGTC CGACGGCGCC TCCAACGGCC TCACCGCGCC GAACGGGCTC TCGCAGCAGC 
69241 GGGTCATCCG GAAGGCGCTC GCCGCGGCCG GGCTGACCGG CGCCGACGTG GACGTCGTCG 
69301 AGGGGCACGG CACCGGCACC CGGCTCGGCG ACCCGGTCGA GGCGGACGCG CTGCTCGCGA 
69361 CGTACGGGCA GGACCGTCCG GCACCGGTCT GGCTGGGCTC GCTGAAGTCG AACATCGGAC 
69421 ATGCCACGGC CGCGGCCGGT GTCGCGGGCG TCATCAAGAT GGTGCAGGCG ATCGGCGCGG 
694 81 GCACGATGCC GCGGACGCTG CATGTGGAGG AGCCCTCGCC CGCCGTCGAC TGGAGCACCG 
69541 GACAGGTGTC CCTGCTCGGC TCCAACCGGC CCTGGCCGGA CGACGAGCGT CCGCGCCGGG 
69601 CGGCCGTCTC CGCGTTCGGG CTCAGCGGGA CGAACGCGCA CGTCATCCTG GAACAGCACC 
69661 GTCCGGCGCC CGTGGCGTCC CAGCCGCCCC GGCCGCCCCG TGAGGAGTCC CAGCCGCTGC 
69721 CGTGGGTGCT CTCCGCGCGG ACTCCGGCCG CGCTGCGGGC CCAGGCGGCC CGGCTGCGCG 
69781 ACCACCTCGC GGCGGCACCG GACGCGGATC CGTTGGACAT CGGGTACGCG CTGGCCACCA 
698 41 GCCGCGCCCA GTTCGCCCAC CGTGCCGCGG TCGTCGCCAC CACCCCGGAC GGATTCCGTG 
69901 CCGCGCTCGA CGGCCTCGCG GACGGCGCGG AGGCGCCCGG AGTCGTCACC GGGACCGCTC 
69961 AGGAGCGGCG CGTCGCCTTC CTCTTCGACG GCCAGGGCGC CCAGCGCGCC GGAATGGGGC 
7 0021 GCGAGCTCCA CCGCCGGTTC CCCGTCTTCG CCGCCGCGTG GGACGAGGTC TCCGACGCGT 
7 0081 TCGGCAAGCA CCTCAAGCAC TCCCCCACGG ACGTCTACCA CGGCGAACAC GGCGCTCTCG 
70141 CCCATGACAC CCTGTACGCC CAGGCCGGCC TGTTCACGCT CGAAGTGGCG CTGCTGCGGC 
70201 TGCTGGAGCA CTGGGGGGTG CGGCCGGACG TGCTCGTCGG GCACTCCGTC GGCGAGGTGA 
70261 CCGCGGCGTA CGCGGCGGGG GTGCTCACCC TGGCGGACGC GACGGAGTTG ATCGTGGCCC 
7 0321 GGGGGCGGGC GCTGCGGGCG CTGCCGCCCG GGGCGATGCT CGCCGTCGAC GGAAGCCCGG 
70381 CGGAGGTCGG CGCCCGCACG GATCTGGACA TCGCCGCGGT CAACGGCCCG TCCGCCGTGG 
70441 TGCTCGCCGG TTCGCCGGAC GATGTGGCGG CGTTCGAACG GGAGTGGTCG GCGGCCGGGC 
70501 GGCGCACGAA ACGGCTCGAC GTCGGGCACG CGTTCCACTC CCGGCACGTC GACGGTGCGC 
70561 TCGACGGCTT CCGTACGGTG CTGGAGTCGC TCGCGTTCGG CGCGGCGCGG CTGCCGGTGG 
70621 TGTCCACGAC GACGGGCCGG GACGCCGCGG ACGACCTCAT AACGCCCGCG CACTGGCTGC 
70681 GCCATGCGCG TCGGCCGGTG CTGTTCTCGG ATGCCGTCCG GGAGCTGGCC GACCGCGGCG 
7 0741 TCACCACGTT CGTGGCCGTC GGCCCCTCCG GCTCCCTGGC GTCGGCCGCG GCGGAGAGCG 
70801 CCGGGGAGGA CGCCGGGACC TACCACGCGG TGCTGCGCGC CCGGACCGGT GAGGAGACCG 
7 0861 CGGCGCTGAC CGCCCTCGCC GAGCTGCACG CCCACGGCGT CCCGGTCGAC CTGGCCGCGG 
7 0921 TACTGGCCGG TGGCCGGCCA GTGGACCTTC CCGTGTACGC GTTCCAGCAC CGTTCCTACT 
7 0981 GGCTGGCCCC GGCCGTGGCG GGGGCGCCGG CCACCGTGGC GGACACCGGG GGTCCGGCGG 
71041 AGTCCGAGCC GGAGGACCTC ACCGTCGCCG AGATCGTCCG TCGGCGCACC GCGGCGCTGC 
71101 TCGGCGTCAC GGACCCCGCC GACGTCGATG CGGAAGCGAC GTTCTTCGCG CTCGGTTTCG 
71161 ACTCACTGGC GGTGC AGCGG CTGCGCAACC AGCTCGCCTC GGCAACCGGG CTGGACCTGC 
71221 CGGCGGCCGT CCTGTTCGAC CACGACACCC CGGCCGCGCT CACCGCGTTC CTCCAGGACC 
71281 GGATCGAGGC CGGCCAGGAC CGGATCGAGG CCGGCGAGGA CGACGACGCG CCCACCGTGC 
.71341 TCTCGCTCCT GGAGGAGATG GAGTCGCTCG ACGCCGCGGA CATCGCGGCG AGGCCGGCCC 
71401 CGGAGCGTGC GGCCATCGCC GATCTGCTCG ACAAGCTCGC CCATACCTGG AAGGACTACC 
714 61 GATGAGCACC GATACGCACG AGGGAACGCC GCCCGCCGGC CGCTGCCCAT TCGCGATCCA 
71521 GGACGGTCAC CGCGCCATCC TGGAGAGCGG CACGGTGGGT TCGTTCGACC TGTTCGGCGT 
71581 CAAGCACTGG CTGGTCGCCG CCGCCGAGGA CGTCAAGCTG GTCACCAACG ATCCGCGGTT 
71641 CAGCTCGGCC GCGCCGTCCG AGATGCTGCC CGACCGGCGG CCCGGCTGGT TCTCCGGGAT 
71701 GGACTCACCG GAGCACAACC GCTACCGGCA GAAGATCGCG GGGGACTTCA CACTGCGCGC 
717 61 GGCGCGCAAG CGGGAGGACT TCGTCGCCGA GGCCGCCGAC GCCTGCCTGG ACGACATCGA 
71821 GGCCGCGGGA CCCGGCACCG ACCTCATCCC CGGGTACGCC AAGCGGCTGC CCTCCCTCGT 
71881 CATCAACGCG CTGTACGGGC TCACCCCTGA GGAGGGGGCC GTGCTGGAGG CACGGATGCG 
71941 CGACATCACC GGCTCGGCCG ATCTGGACAG CGTCAAGACG CTGACCGACG ACTTCTTCGG 
72001 GCACGCGCTG CGGCTGGTCC GCGCGAAGCG TGACGAGCGG GGCGAGGACC TGCTGCACCG 
72061 GCTGGCCTCG GCCGACGACG GCGAGATCTC GCTCAGCGAC GACGAGGCGA CGGGCGTGTT 
72121 CGCGACGCTG CTGTTCGCCG GCCACGACTC GGTGCAGCAG ATGGTCGGCT ACTGCCTCTA 
72181 CGCACTGCTC AGCCACCCCG AGCAGCAGGC GGCGCTGCGC GCGCGCCCGG AGCTGGTCGA 
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72241 CAACGCGGTC GAGGAGATGC TCCGTTTCCT GCCCGTCAAC CAGATGGGCG TACCGCGCGT 
72301 CTGTGTCGAG GACGTCGATG TGCGGGGCGT GCGCATCCGT GCGGGCGACA ACGTGATCCC 
72361 GCTCTACTCG ACGGCCAACC GCGACCCCGA GGTGTTCCCG CAGCCCGACA CCTTCGATGT 
724 21 GACGCGCCCG CTGGAGGGCA ACTTCGCGTT CGGCCACGGC ATTCACAAGT GTCCCGGCCA 
5 724 81 GCACATCGCC CGGGTGCTCA TCAAGGTCGC CTGCCTGCGG TTGTTCGAGC GTTTCCCGGA 
72541 CGTCCGGCTG GCCGGCGACG TGCCGATGAA CGAGGGGCTC GGGCTGTTCA GCCCGGCCGA 
72 601 GCTGCGGGTC ACCTGGGGGG CGGCATGAGT CACCCGGTGG AGACGTTGCG GTTGCCGAAC 
72 661 GGGACGACGG TCGCGCACAT CAACGCGGGC GAGGCGCAGT TCCTCTACCG GGAGATCTTC 
72721 ACCCAGCGCT GCTACCTGCG CCACGGTGTC GACCTGCGCC CGGGGGACGT GGTGTTCGAC 
10 72781 GTCGGCGCGA ACATCGGCAT GTTCACGCTT TTCGCGCATC TGGAGTGTCC TGGTGTGACC 
72841 GTGCACGCCT TCGAGCCCGC GCCCGTGCCG TTCGCGGCGC TGCGGGCGAA CGTGACGCGG 
72901 CACGGCATCC CGGGCCAGGC GGACCAGTGC GCGGTCTCCG ACAGCTCCGG CACGCGGAAG 
72961 ATGACCTTCT ATCCCGACGC CACGCTGATG TCCGGTTTCC ACGCGGATGC CGCGGCCCGG 
73021 ACGGAGCTGT TGCGCACGCT CGGCCTCAAC GGCGGCTACA CCGCCGAGGA CGTCGACACC 
15 73081 ATGCTCGCGC AACTGCCCGA CGTCAGCGAG GAGATCGAAA CCCCTGTGGT CCGGCTCTCC 
73141 GACGTCATCG CGGAGCGCGG TATCGAGGCC ATCGGCCTGC TGAAGGTCGA CGTGGAGAAG 
7 3201 AGCGAACGGC AGGTCTTCGC CGGCCTCGAG GACACCGACT GGCCCCGTAT CCGCCAGGTC 
73261 GTCGCGGAGG TCCACGACAT CGACGGCGCG CTCGAGGAGG TCGTCACGCT GCTCCGCGGC 
73321 CATGGCTTCA CCGTGGTCGC CGAGCAGGAA CCGCTGTTCG CCGGCACGGG CATCCACCAG 
20 7 3381 GTCGCCGCGC GGCGGGTGGC CGGCTGAGCG CCGTCGGGGC CGCGGCCGTC CGCACCGGCG 
7 3441 GCCGCGGTGC GGACGGCGGC TCAGCCGGCG TCGGACAGTT CCTTGGGCAG TTGCTGACGG 
73501 CCCTTCACCC CCAGCTTGCG GAACACGTTG GTGAGGTGCT GTTCCACCGT GCTGGAGGTG 
7 35 61 ACGAACAGCT GGCTGGCGAT CTCCTTGTTG GTGCGCCCGA CCGCGGCGTG CGACGCCACC 
73621 CGCCGCTCCG CCTCGGTCAG CGATGTGATC CGCTGCGCCG GCGTCACGTC CTGGGTGCCG 
25 73 681 TCCGCGTCCG AGGACTCCCC ACCGAGCCGC CGGAGGAGCG GCACGGCTCC GCACTGGGTC 
7 3741 GCGAGGTGCC GTGCGCGGCG GAACAGTCCC CGCGCACGGC TGTGCCGCCG GAGCATGCCG 
73801 CACGCTTCGC CCATGTCGGC GAGGACGCGG GCCAGCTCGT ACTGGTCGCG G CACATG AT G 
738 61 AGCAGATCGG CGGCCTCGTC GAGCAGTTCG ATCCGCTTGG CCGGCGGACT GTAGGCCGCC 
7 3921 TGCACCCGCA GCGTCATCAC CCGCGCCCGG GAGCCCATCG GCCGGGACAG CTGCTCGGAG 
30 7 3 981 ATGAGCCTCA GCCCCTCGTC ACGGCCGCGG CCGAGCAGCA GAAGCGCTTC GGCGGCGTCG 
74 041 ACCCGCCACA GGGCCAGGCC CGGCACGTCG ACGGACCAGC GTCGCATCCG CTCCCCGCAG 
74101 TCCCGGAACG CGTTGTACGC CGCCCGGTAC CGCCCGGCCG CGAGATGGTG TTGCCCACGG 
74161 GCCCAGACCA TGTGCAGTCC GAAGAGGCTG TCGGAGGTCT CCTCCGGCAA CGGCTCGGCG 
74221 AGCCACCGCT CCGCCCGGTC CAGGTCGCCC AGTCGGATCG CGGCGGCCAC GGTGCTGCTC 
35 7 4 281 AGCGGCAATG CGGCGGCCAT CCCCCAGGAG GGCACGACCC GGGGGGCGAG CGCGGCCTCG 
7 4 341 CCGCATTCGA CGGCGGCGGT CAGGTCGCCG CGGCGCAGCG CGGCCTCGGC GCGGAACCCC 
74 4 01 GCGTGGACCG CCTCGTCGGC CGGGGTCCGC ATGTTGTCGT CACCGGCCAG CTTGTCGACC 
74 4 61 CAGGACTGGA CGGCATCGGT GTCCTCGGCG TAGAGCAGGG CCAGCAACGC CATCATGGTC 
7 4 521 GTGGTCCGGT CCGTCGTGAC CCGGGAGTGC TGGAGCACGT ACTCGGCTTT GGCCTCGGCC 
40 74 581 TGTTCGGACC AGCCGCGCAG CGCGTTGCTC AGGGCCTTGT CGGCGACGGC GCGGTGCCGG 
7 4 641 ACGGCTCCGG AAAACGAGGC GACCTCGTGC TCGGCCGGCG GATCGGCCGG ACGCGGCGGA 
74701 TCGGCCGCGC CGGGATAGAT CAGCGCGAGG GACAGGTCCG CGACGCGCAG GTGCGCCCGG 
74761 CCCTGCTCGC TCGGGGCGGC GGAGCGCTGG GCCGCCAGGA CCTCGGCGGC CTCGCCCGGC 
7 4 821 CGCCCGTCCA TCGCCAGCCA GCAGGCGAGC GACACGGCGT GCTCGCTGGA GAGGAGCCGT 
45 7 4881 TCCCGCGACG CGGTGAGCAG CTCGGGCACA TGCCGGCCGG ATCTGGCGGG ATCGCAGAGC 
7 4 941 CGCTCGATGG CGGCGGTGTC GACGCGCAGT GCGGCGTGGA CGGCGGGGTC GTCGGAGGCC 
75001 CGGTAGGCGA ACTCCAGGTA GGTGACGGCC TCGTCGAGCT CGCCGCGCAG GTGGTGCTCG. 
7 5061 CGCGCGGCGT CGGTGAACAG CCCGGCGACC TCGGCGCCGT GCACCCGGCC GGTACCCATC 
7 5121 TGGTGGCGGG CGAGCACCTT GCTGGCCACG CCGCGGTCCC GCAGCAGTTC CAGCGCCAGC 
50 7 5181 TCGTGCAGGC CACGCCGCTC GGCGGCGGAG AGGTCGTCGA GTACGACGGA GCGGGCCGCG 
7 5241 GGGTGCGGGA ACCGCCCTTC CCGCAGCAGC CGCCCCTCGA CCAGCTGTTC GTGGGCCTGC 
7 5301 TCGACCGCCT CGGTGTCGAG GCCGGTCATC CGCTGGACGA GGGTGAGTTC GACACTCTCG 
7 5*361 CCGAGCACGG CGGAAGCTCG GGCGACGCTC AGCGCGGCCG GGCCGCAACG ATAGAGCGAC 
7 54 21. CCGAGGTAGG CGAGCCGGTA CGCCCGCCCC GCGACCACTT CCAGGCACCC TGAGGTCCGT 
55 7 5481 GTCCGTGCCT CCCGGATGTC GTCGATCAGG CCGTGGCCGA GGAGCAGGTT GCCGCCGGTC 
75541 GCCCGGAACG CCTGGGCCAC CACGTCGTCG TGCGCGTCCT GGCCGAGGTG CCGGCGCACG 
7 5601 AGTTCGG7GG TCTGCGCCTC GGTGAGCGGG CGCAGCGCGA TCTCGTGGTA GTGGCGCAGA 
7 5 661 CTCAGCAGTG CCGCCCGGAA TTGGGAGTGG GCGGGCGTCG GCCGGAGCAG CTCGGTCAGC 
75721 ACGATGGCGA CACGGGCCCG GCTGATGCGG CGCGCGAGGT GGAGCAGGCA CCGCAGCGAC 
60 7 5781 GGCGCGTCGG CGTGGTGCAC GTCGTCGATG CCGATCAGTA CGGGCCGCTC CGCGGCGAGC 
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7584 1 GTCAGCACCG TGCGGGTGAG. TTCGGTCCCC AGGCGGTTGT CGACGTCGGC CGGCAGGTTT 
7 5 901 TCGCACGATG CCGTCAGCCG GACCAGCTCC GGTGTCCGGG CGGCCAGCTC GGGCTGGTCG 
75961 AGGAGCTGGC CGAGCATGCC GTACGGCAGG GCCCGCTCCT CCATGGAGCA CACCGCGCGA 
7 6021 AGGGTGACGA AGCCGGCCTT GGCCGCGGCG GCGTCGAGGA GTTCGGTCTT GCCGCAGGCG 
7 6081 ATCGGCCCGG TGACGGCGGC GACGACGCCC CGCCCGCCCC CCGCTCGGGT GAGCGCCCGG 
7 6141 TGGAGGGAAC CGAACTCGTC ATCGCGGGCG ATCAGGTCTG G GG G AG AT AA GCGCGCTATC 
7 6201 ACGAATGGAA CTACCTCGCG ACCGTCGTGG AAACCCATAG G CATC AC AT G GCTTGTTGAT 
7 6261 CTGTACGGCT GTGATTCAGC CTGGCGGGAT GCTGTGCTAC AGATGGGAAG ATGTGATCTA 
7 6321 GGGCCGTGCC GTTCCCTCAG GAGCCGACCG CCCCCGGCGC CACCCGCCGT ACCCCCTGGG 
7 6381 CCACCAGCTC GGCGACCCGC TCCTGGTGGT CGACGAGGTA GAAGTGCCCG CCGGGGAAGA 
7 6441 CCTCCACCGT GGTCGGCGCG GTCGTGTGCC CGGCCCAGGC GTGGGCCTGC TCCACCGTCG 
7 6501 TCTTCGGATC GTCGTCACCG ATGCACACCG TGATCGGCGT CTCCAGCGGC GGCGCGGGCT 
7 6561 CCCACCGGTA CGTCTCCGCC GCGTAGTAGT CCGCCCGCAA CGGCGCCAGG ATCAGCGCGC 
7 6621 GCATTTCGTC GTCCGCCATC ACATCGGCGC TCGTCCCGCC GAGGCCGATG ACCGCCGCCA 
7 6681 GCAGCTCGTC GTCGGACGCG AGGTGGTCCT GGTCGGCGCG CGGCTGCGAC GGCGCCCGCC 
7 6741 GGCCCGAGAC GATCAGGTGC GCCACCGGGA GCCGCTGGGC CAGCTCGAAC GCGAGTGTCG 
7 6801 CGCCCATGCT GTGGCCGAAC AGCACCAGCG GACGGTCCAG CCCCGGCTTC AACGCCTCGG 
7 6861 CCACGAGGCC GGCGAGAACA CGCAGGTCGC GCACCGCCTC CTCGTCGCGG CGGTCCTGGC 
7 6921 GGCCGGGGTA CTGCACGGCG TACACGTCCG CCACCGGGGC GAGCGCACGG GCCAGCGGAA 
7 6981 GGTAGAACGT CGCCGATCCG CCGGCGTGGG GCAGCAGCAC CACCCGTACC GGGGCCTCGG 
770 41 GCGTGGGGAA GAACTGCCGC AGCCAGAGTT CCGAGCTCAC CGCACCCCCT CGGCCGCGAC 
77101 CTGGGGAGCC CGGAACCGGG TGATCTCGGC CAAGTGCTTC TCCCGCATCT CCGGGTCGGT 
77161 CACGCCCCAT CCCTCCTCCG GCGCCAGACA GAGGACGCCG ACTTTGCCGT TGTGCACATT 
77 221 GCGATGCACA TCGCGCACCG CCGACCCGAC GTCGTCGAGC GGGTAGGTCA CCGACAGCGT 
77281 CGGGTGCACC ATCCCCTTGC AGATCAGGCG GTTCGCCTCC CACGCCTCAC GATAGTTCGC 
77 341 GAAGTGGGTA CCGATGATCC GCTTCACGGA CATCCACAGG TACCGATTGT CAAAGGCGTG 
77 4 01 CTCGTATCCC GAGGTTGACG CGCAGGTGAC GATCGTGCCA CCCCGACGTG TCACGTAGAC 
774 61 ACTCGCGCCG AACGTCGCGC GCCCCGGGTG CTCGAACACG ATGTCGGGAT CGTCAQCGCC 
77521 GGTCAGCTCC CGGATC 



Those of skill in the art will recognize that, due to the degenerate nature of the 
genetic code, a variety of DNA compounds differing in their nucleotide sequences can be 
used to encode a given amino acid sequence of the invention. The native DNA sequence 
encoding the FK-520 PKS of Streptomyces hygroscopicus is shown herein merely to 
illustrate a preferred embodiment of the invention, and the present invention includes DNA 
compounds of any sequence that encode the amino acid sequences of the polypeptides and 
proteins of the invention. In similar fashion, a polypeptide can typically tolerate one or more 
amino acid substitutions, deletions, and insertions in its amino acid sequence without loss or 
significant loss of a desired activity. The present invention includes such polypeptides with 
alternate amino acid sequences, and the amino acid sequences shown merely illustrate 
preferred embodiments of the invention. 

The recombinant nucleic acids, proteins, and peptides of the invention are many and 
diverse. To facilitate an understanding of the invention and the diverse compounds and 
methods provided thereby, the following general description of the FK-520 PKS genes and 
modules of the PKS proteins encoded thereby is provided. This general description is 
followed by a more detailed description of the various domains and modules of the FK-520 
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PKS contained in and encoded by the compounds of the invention. In this description, 
reference to a heterologous PKS refers to any PKS other than the FK-520 PKS. Unless 
otherwise indicated, reference to a PKS includes reference to a portion of a PKS. Moreover, 
reference to a domain, module, or PKS includes reference to the nucleic acids encoding the 
same and vice-versa, because the methods and reagents of the invention provide or enable 
one to prepare proteins and the nucleic acids that encode them. 

The FK-520 PKS is composed of three proteins encoded by three genes designated 
JkbA.JkbB, and JkbC. The JkbA ORF encodes extender modules 7 - 10 of the PKS. The JkbB 
ORF encodes the loading module (the CoA ligase) and extender modules 1 -4 of the PKS. 
The flcbC ORF encodes extender modules 5 - 6 of the PKS. The fkbP ORF encodes the 
NRPS that attaches the pipecolic acid and cyclizes the FK-520 polyketide. 

The loading module of the FK-520 PKS includes a CoA ligase, an ER domain, and 
an ACP domain. The starter building block or unit for FK-520 is believed to be a 
dihydroxycyclohexene carboxylic acid, which is derived from shikimate. The recombinant 
DNA compounds of the invention that encode the loading module of the FK-520 PKS and 
the corresponding polypeptides encoded thereby are useful for a variety of methods and in a 
variety of compounds. In one embodiment, a DNA compound comprising a sequence that 
encodes the FK-520 loading module is inserted into a DNA compound that comprises the 
coding sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for the loading module of the heterologous PKS is replaced by the coding 
sequence for the FK-520 loading module, provides a novel PKS coding sequence. Examples 
of heterologous PKS coding sequences include the rapamycin, FK-506, rifamycin, and 
avermectin PKS coding sequences. In another embodiment, a DNA compound comprising a 
sequence that encodes the FK-520 loading module is inserted into a DNA compound that 
comprises the coding sequence for the FK-520 PKS or a recombinant FK-520 PKS that 
produces an FK-520 derivative. 

In another embodiment, a portion of the loading module coding sequence is utilized 
in conjunction with a heterologous coding sequence. In this embodiment, the invention 
provides, for example, either replacing the CoA ligase with a different CoA ligase, deleting 
the ER, or replacing the ER with a different ER. In addition, or alternatively, the ACP can 
be replaced by another ACP. In similar fashion, the corresponding domains in another 
loading or extender module can be replaced by one or more domains of the FK-520 PKS. 
The resulting heterologous loading module coding sequence can be utilized in conjunction 
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with a coding sequence for a PKS that synthesizes FK-520, an FK-520 derivative, or 
another polyketide. 

The first extender module of the FK-520 PKS includes a KS domain, an AT domain 
specific for methylmalonyl CoA, a DH domain, a KR domain, and an ACP domain. The 
recombinant DNA compounds of the invention that encode the first extender module of the 
FK-520 PKS and the corresponding polypeptides encoded thereby are useful for a variety of 
applications. In one embodiment, a DNA compound comprising a sequence that encodes the 
FK-520 first extender mbdule is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding sequence for 
a module of the heterologous PKS is either replaced by that for the first extender module of 
the FK-520 PKS or the latter is merely added to coding sequences for modules of the 
heterologous PKS, provides a novel PKS coding sequence. In another embodiment, a DNA 
compound comprising a sequence that encodes the first extender module of the FK-520 
PKS is inserted into a DNA compound that comprises the remainder of the coding sequence 
for the FK-520 PKS or a recombinant FK-520 PKS that produces an FK-520 derivative. 

In another embodiment, all or only a portion of the first extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a hybrid 
module. In this embodiment, the invention provides, for example, either replacing the 
methylmalbnyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 2- 
hydroxymalonyl CoA specific AT; deleting either the DH or KR or both; replacing the DH 
or KR or both with another DH or KR; and/or inserting an ER. In replacing or inserting KR, 
DH, and ER domains, it is often beneficial to replace the existing ICR, DH, and ER domains 
with the complete set of domains desired from another module. Thus, if one desires to insert 
an ER domain, one may simply replace the existing KR and DH domains with a KR, DH, 
and ER set of domains from a module containing such domains. In addition, the KS and/or 
ACP can be replaced with another KS and/or ACP. In each of these replacements or 
insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 
from a coding sequence for another module of the FK-520 PKS, from a gene for a PKS that 
produces a polyketide other than FK-520, or from chemical synthesis. The resulting 
heterologous first extender module coding sequence can be utilized in conjunction with a 
coding sequence for a PKS that synthesizes FK-520, an FK-520 derivative, or another 
polyketide. In similar fashion, the corresponding domains in a module of a heterologous 
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PKS can be replaced by one or more domains of the first extender module of the FK-520 
PKS. 

In an illustrative embodiment of this aspect of the invention, the invention provides 
recombinant PKSs and recombinant DNA compounds and vectors that encode such PKSs in 
which the KS domain of the first extender module has been inactivated. Such constructs are 
especially useful when placed in translational reading frame with the remaining modules 
and domains of an FK-520 or FK-520 derivative PKS. The utility of these constructs is that 
host cells expressing, or cell free extracts containing, the PKS encoded thereby can be fed or 
supplied with N-acylcysteamine thioesters of novel precursor molecules to prepare FK-520 
derivatives. See U.S. patent application Serial No. 60/1 17,384, filed 27 Jan. 1999, and PCT 
patent publication Nos. US97/02358 and US99/03986, each of which is incorporated herein 
by reference. 

The second extender module of the FK-520 PKS includes a KS, an AT specific for 
methylmalonyl CoA, a KR, an inactive DH, and an ACP. The recombinant DNA 
compounds of the invention that encode the second extender module of the FK-520 PKS 
and the corresponding polypeptides encoded thereby are useful for a variety of applications. 
In one embodiment, a DNA compound comprising a sequence that encodes the FK-520 
second extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding sequence for 
a module of the heterologous PKS is either replaced by that for the second extender module 
of the FK-520 PKS or the latter is merely added to coding sequences for the modules of the 
heterologous PKS, provides a novel PKS coding sequence. In another embodiment, a DNA 
compound comprising a sequence that encodes the second extender module of the FK-520 
PKS is inserted into a DNA compound that comprises the coding sequence for the 
remainder of the FK-520 PKS or a recombinant FK-520 PKS that produces an FK-520 
derivative. 

In another embodiment, all or a portion of the second extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a hybrid 
module. In this embodiment, the invention provides, for example, either replacing the 
methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 2- 
hydroxymalonyl CoA specific AT; deleting the KR and/or the inactive DH; replacing the 
KR with another KR; and/or inserting an active DH or an active DH and an ER. In addition, 
the KS and/or ACP can be replaced with another KS and/or ACP. In each of these 
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replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding 
sequence can originate from a coding sequence for another module of the FK-520 PKS, 
from a coding sequence for a PKS that produces a polyketide other than FK-520, or from 
chemical synthesis. The resulting heterologous second extender module coding sequence 
can be utilized in conjunction with a coding sequence from a PKS that synthesizes FK-520, 
an FK-520 derivative, or another polyketide. In similar fashion, the corresponding domains 
in a module of a heterologous PKS can be replaced by one or more domains of the second 
extender module of the FK-520 PKS. 

The third extender module of the FK-520 PKS includes a KS, an AT specific for 
malonyl Co A, a KR, an inactive DH, and an ACP. The recombinant DNA compounds of the 
invention that encode the third extender module of the FK-520 PKS and the corresponding 
polypeptides encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the FK-520 third extender module is 
inserted into a DNA compound that comprises the coding sequence for a heterologous PKS. 
The resulting construct, in which the coding sequence for a module of the heterologous PKS 
is either replaced by that for the third extender module of the FK-520 PKS or the latter is 
merely added to coding sequences for the modules of the heterologous PKS, provides a 
novel PKS coding sequence. In another embodiment, a DNA compound comprising a 
sequence that encodes the third extender module of the FK-520 PKS is inserted into a DNA 
compound that comprises the coding sequence for the remainder of the FK-520 PKS or a 
recombinant FK-520 PKS that produces an FK-520 derivative. 

In another embodiment, all or a portion of the third extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a hybrid 
module. In this embodiment, the invention provides, for example, either replacing the 
malonyl CoA specific AT with a methylmalonyl CoA, ethylmalonyl CoA, or 2- 
hydroxymalonyl CoA specific AT; deleting the KR and/pr the inactive DH; replacing the 
KR with another KR; and/or inserting an active DH or an active DH and an ER. In addition, 
the KS and/or ACP can be replaced with another KS and/or ACP. In each of these 
replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding 
sequence can originate from a coding sequence for another module of the FK-520 PKS, 
from a coding sequence for a PKS that produces a polyketide other than FK-520, or from 
chemical synthesis. The resulting heterologous third extender module coding sequence can 
be utilized in conjunction with a coding sequence from a PKS that synthesizes FK-520, an 
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FK-520 derivative, or another polyketide. In similar fashion, the corresponding domains in a 
module of a heterologous PKS can be replaced by one or more domains of the third 
extender module ofthe FK-520 PKS. 

The fourth extender module of the FK-520 PKS includes a KS, an AT that binds 
5 ethylmalonyl CoA, an inactive DH, and an AGP. The recombinant DNA compounds of the 
invention that encode the fourth extender module of the FK-520 PKS and the corresponding 
polypeptides encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the FK-520 fourth extender module is 
inserted into a DNA compound that comprises the coding sequence for a heterologous PKS. 
10 The resulting construct, in which the coding sequence for a module of the heterologous PKS 
is either replaced by that for the fourth extender module of the FK-520 PKS or the latter is 
merely added to coding sequences for the modules of the heterologous PKS, provides a 
novel PKS coding sequence. In another embodiment, a DNA compound comprising a 
sequence that encodes the fourth extender module of the FK-520 PKS is inserted into a 
1 5 DNA compound that comprises the remainder of the coding sequence for the FK-520 PKS 
or a recombinant FK-520 PKS that produces an FK-520 derivative. 

In another embodiment, a portion of the fourth extender module coding sequence is 
utilized in conjunction with other PKS coding sequences to create a hybrid module. In this 
embodiment, the invention provides, for example, either replacing the ethylmalonyl CoA 
20 specific AT with a malonyl CoA, methylmalonyl CoA, or 2-hydroxymalohyl CoA specific 
AT; and/or deleting the inactive DH, inserting a KR, a KR and an active DH, or a KR, an 
active DH, and an ER. In addition, the KS and/or ACP can be replaced with another KS 
and/or ACP. In each of these replacements or insertions, the heterologous KS, AT, DH, KR, 
ER, or ACP coding sequence can originate from a coding sequence for another module of 
25 the FK-520 PKS, a PKS for a polyketide other than FK-520, or from chemical synthesis. 
The resulting heterologous fourth extender module coding sequence can be utilized in 
conjunction with a coding sequence for a PKS that synthesizes FK-520, an FK-520 
derivative, or another polyketide. In similar fashion, the corresponding domains in a module 
of a heterologous PKS can be replaced by one or more domains of the fourth extender 
30 module of the FK-520 PKS. 

As illustrative examples, the present invention provides recombinant genes, vectors, 
and host cells that result from the conversion of the FK-506 PKS to an FK-520. PKS and 
vice-versa. In one embodiment, the invention provides a recombinant set of FK-506 PKS 
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genes but in which the coding sequences for the fourth extender module or at least those for 
the AT domain in the fourth extender module have been replaced by those for the AT 
domain of the fourth extender module of the FK-520 PKS. This recombinant PKS can be 
used to produce FK-520 in recombinant host cells. In another embodiment, the invention 
5 provides a recombinant set of FK-520 PKS genes but in which the coding sequences for the 
fourth extender module or at least those for the AT domain in the fourth extender module 
have been replaced by those for the AT domain of the fourth extender module of the FK- 
506 PKS. This recombiniant PKS can be used to produce FK-506 in recombinant host cells. 
Other examples of hybrid PKS enzymes of the invention include those in which the 
10 AT domain of module 4 has been replaced with a malonyl specific AT domain to provide a 
PKS that produces 21-desethyl-FK520 or with a methylmalonyl specific AT domain to 
provide a PKS that produces 21-desethyl-21-methyl-FK520. Another hybrid PKS of the 
invention is prepared by replacing the AT and inactive KR domain of FK-520 extender 
module 4 with a methylmalonyl specific AT and an active KR domain, such as, for 

15 example, from module 2 of the DEBS or oleandolide PKS enzymes, to produce 21-desethyl- 
21-methyl-22-desoxo-22-hydroxy-FK520. The compounds produced by these hybrid PKS 
enzymes are neurotrophins. 

The fifth extender module of the FK-520 PKS includes a KS, an AT that binds 
methylmalonyl CoA, a DH, a KR, and an ACP. The recombinant DNA compounds of the 

20 invention that encode the fifth extender module of the FK-520 PKS and the corresponding 
polypeptides encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the FK-520 fifth extender module is 
inserted into a DNA compound that comprises the coding sequence for a heterologous PKS. 
The resulting construct, in which the coding sequence for a module of the heterologous PKS 

25 is either replaced by that for the fifth extender module of the FK-520 PKS or the latter is 
merely added to coding sequences for the modules Of the heterologous PKS, provides a 
novel PKS. In another embodiment, a DNA compound comprising a sequence that encodes 
the fifth extender module of the FK-520 PKS is inserted into a DNA compound that 
comprises the coding sequence for the FK-520 PKS or a recombinant FK-520 PKS that 

30 produces an FK-520 derivative. 

In another embodiment, a portion of the fifth extender module coding sequence is 
utilized in conjunction with other PKS coding sequences to create a hybrid module. In this 
embodiment, the invention provides, for example, either replacing the methylmalonyl CoA 
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